By Kendra Little on September 8, 2024
Every time I share a recommendation to use data compression in SQL Server to reduce physical IO and keep frequently accessed data pages in memory, I hear the same concern from multiple people: won’t this increase CPU usage for inserts, updates, and deletes?
DBAs have been trained to ask this question by many trainings and a lot of online content – I used to mention this as a tradeoff to think about, myself– but I’ve found this is simply the wrong question to ask.
In this post I’ll share the two questions that are valuable to ask for your workload.
Are data modifications slow enough that they are impacting application performance/customer experience, or close to it?
If you already have a problem where data modifications are problematic and are impacting your customer’s experience, have a look at what is causing those problems.
In this section, I’m listing the issues that plague most systems and cause slow modifications. All of these make data modifications increase by seconds or minutes:
-
Poorly optimized modification queries – including using merge syntax: If modifications are slow, your first step is to find the query plans and execution statistics for the modification queries. Are they doing lots of reads? Are they having issues with parameter sniffing? Merge statements may save some lines in writing code, but they often create terribly performing execution plans. (Please read Michael J Swart’s post, “What to Avoid if You Want to Use MERGE for even more reasons to avoid merge.)
-
Cascading updates/deletes: These are a recipe for slowness when it comes to foreign keys in SQL Server. The cascading updates and deletes occur within the calling transaction scope, and the serializable isolation level is used to help ensure data correctness. If you have these in place and are using them, data compression is the last thing I’d worry about for slowness on modifications: you’ve already got a much bigger monster in the house.
-
Non-optimized or overloaded triggers: Not much explanation needed for this one, I think. Everyone hates digging into trigger code to optimize it, but sometimes you need to put on your hazmat suit and go in there. And sometimes you need to talk to other folks about what business logic can move into application code and not need to run in a trigger.
-
Pessimistic isolation levels increasing locking and blocking: Sometimes modifications are slow because they can’t get the locks needed to commit. SQL Server implements a pessimistic version of the Read Committed isolation level by default, under which readers block writers, and writers block readers. This can massively slow down modifications.
-
Non-indexed foreign key relationship: Let’s say we have two tables, Invoice and Payment. For every invoice, we might have 0 or more payments. There is a foreign key on Payment.InvoiceId that ensures that all InvoiceIds in Payment have a corresponding “parent” row in the Invoice table. The bit that most folks miss about this is that when an InvoiceId is updated or deleted in the Invoice table (perhaps for an archive process or a process synchronizing data between sources), the database engine has to make sure that this isn’t creating a problem in the foreign key relationship. If Payment.InvoiceId does not have a supporting index and and the Payment table is large, this can make that modification to Invoice very slow. Note: in this case, applying data compression to the structures would likely make the modifications faster, as it’s shrinking the structure that has to be scanned. But either way, the foreign key relationship needs a supporting index so it doesn’t have to do a scan to check referential integrity.
-
Indexed indexed view lacking appropriate supporting indexes: When you create an indexed view that joins between multiple tables, you need to make sure that it doesn’t adversely impact modifications to those tables. If you don’t have the appropriate indexes on the member tables to support the join columns, data modifications may cause scans of the tables. This is another case where compressing data structures on the member tables could actually speed up modifications by reducing the page count of what is scanned, but really what you need is good indexing.
If you are concerned about data modification performance, by all means evaluate what is causing your modifications to be slow and solve those problems as a priority!
But meanwhile DBAs have been trained to worry about data compression. Data compression is generally something that increases data modifications by a matter of milliseconds. (I say generally only because I’m sure there’s someone out there who has a horrific schema where data compression manages to increase modification time by a second or more – there’s always an exception.)
Does data compression reduce your overall CPU usage? This matters because it can reduce your licensing costs
This is the valuable question to ask about CPU – what is the CPU impact that data compression has on your workload overall? This is worth exploring because it can save you significant money.
I have seen data compression have an impact on CPU by lowering it for OLTP workloads. This happens when data compression helps reduce the number of physical reads that your system is doing, especially when storage is slow. Here’s how the magic works:
- Data compression reduces the amount of data pages an index takes up, both on disk and in memory
- This allows more data to fit into the data cache (buffer pool), which reduces the amount of physical reads for queries
- Reads from memory are MUCH faster than reads from storage, so the amount of time queries take to execute speeds up
- Faster queries mean fewer queries cycling on and off schedulers, there’s just more room for everyone – CPU usage drops
Have you ever noticed that tuning indexes can allow you to dramatically reduce CPU usage on a SQL Server. This works for the exact same reasons.
But Kendra, I have a narrow table that’s modified a million times a minute, can you prove it won’t cause CPU problems?
Not unless you’re paying me a consulting rate. Look, I’m not telling you to compress EVERY index or to not use common sense. I support you in looking for low hanging fruit, and there is almost always PLENTY of that to spend your time with.
Any database change can cause a regression
Making changes to databases is tough, I get it. SQL Server workloads are quite complex, and even simple “best practice” changes can cause unexpected regressions. I’ve even had adding memory and CPUs to a database instance cause regressions by changing query plans when the optimizer’s choice on the new plan wasn’t ideal.
Always be careful applying a change:
- Use pre-production systems to validate changes.
- Work within your change control process and document what you are deploying, and when.
- Identify how to version control your changes, and understand how you will roll them back if needed.
- Baseline performance before you make changes.
- Deploy changes in discrete chunks.
- Observe how your baseline adjusts as the change burns in.
But all that said, data compression in SQL Server is still an underrated and underused feature of Enterprise Edition, and concerns about CPU impact to data modifications unnecessarily block many people from benefiting from the feature. It can speed up many things you would think it might slow down, including even modifying data types.