on April 16, 2018
At a recent conference, two speakers reminded me of something important: when you put effort into learning something or helping folks, don’t simply put your words in private emails or post-it notes on your desk. Whenever possible, blog it as well. It can help other people, and it can also help you remember it in the future!
I’ve been kinda/sorta/pretty good about this when it comes to Dear SQL DBA questions. I am going to start being more consistent about it, and sharing the answers with YOU GUYS, too.
Know that you are always invited to also be the SQL DBA on the other end of the email, too, and chime in with your take in the comments. Feel free to disagree with me, but be kind and respectful to the anonymous questioner– there’s a person on the other end of that keyboard.
Question: I’ve got a poorly indexed database from a commercial software vendor. What should I do?
Here’s the question:
I’ve encountered a performance issue which I believe is related to indexing. Our database (created by a commercial software vendor) includes a table with over 78 million rows… and no clustered index! This table is queried by the software many times daily and the effects appear to be hindering our performance.
There are two non-clustered indices on the offending table: one non-unique including a column called [ColumnA], and the other is unique and includes only [ColumnB] which is the primary key. Both [ColumnA] and [ColumnB] are of uniqueidentifier data type, so I believe that adding a clustered index would hinder performance (please correct me if I’m wrong!).
The problem is that the query which hits this table requests four columns in addition to [ColumnA] and [ColumnB] which are not included in the indices. The estimated query plan shows a RID Lookup (99% estimated cost!) on this table and I really want to get rid of it so we can boost performance, but I’m really hesitant to make any changes to the indices on a table of this size.
Now, I know some of you out there have immediately started itching. We’ve got a heap with 78 million rows (not sure how many GB it is, but that’s a few rows), we’ve got a couple of GUID columns, and we have got what sounds to be a dramatically bad indexing solution.
I do immediately have doubts about whether this table should be a heap or not – and I also immediately wonder if deletes happen against this heap, and if it might have a lot of empty space trapped in the heap from deletes that haven’t escalated, or forwarded records – issues specific to heaps.
Depending on how the table is most often queried, a clustered index might make everything faster.
But not so fast, Kendra
Since this database is created by a commercial vendor, there’s more to think about. We can’t just spring into action.
I wrote a little bit about these types of databases here: /2017/01/10/administering-cots-databases-isvs-third-party-vendors/
The first thing I would identify is around item #2 in that article: what is the situation with the vendor, what is the support agreement, and how do you work with it? If you go changing around things without first figuring out what that situation is, you might improve performance in the short term, but end up facing a big old problem the next time an upgrade comes out for the software, or end up unsupported if something goes wrong.
It’s a bummer that you can’t just get right to tuning, but I’ve known folks to get really burned by it before.
I have also known cases where folks who explained the problem to the vendor were happily surprised that the vendor had a fix for that situation. (Maybe they have an alternate set of indexes they give to clients at a certain scale, or other changes.)
So it’s worth investigating first with the vendor. Depending on how things turn out there, I would be interested in looking at a test environment for the indexing question, and personally I would test out a clustered index on one of those unique keys.