Transcript
What if my query doesn’t care about CustomerId?
Maybe I want to know how many keywords I have just for ‘unicorn’ across the entire table.
Well, every single customer may have a unicorn Keyword, I don’t know. I am going to have to go into every partition, and scan every single partition.
I can create a nonclustered index on Keyword that is aligned on my partitioned table
I can create a partitioned nonclustered index on Keyword, in this case, because my, if I create it as aligned to my base table it’s all by CustomerId. So now, to satisfy this query, right, yes it does have the CustomerId tucked into the index. I’m going to have to go in every partition. I can do a seek, because this nonclustered index is on Keyword, but I’m going to have to go in each one, and be like, where is your unicorn, where is your unicorn, where is your unicorn, if you have any.
If I hadn’t partitioned this table, I would have a single index on unicorn ID. If I created this index, then I would be able to go straight to it just once. So in this case, I have to do a bunch of different seeks. I have to do a seek for every partition I’ve defined, if it is a partitioned index on Keyword, and my partitioning key is on CustomerId.
But if it’s a non-partitioned index, it’s a simpler operation, just by the nature of the structure being different.
If I can’t get partition elimination, I’ve added complexity to my query
I’ve added more work for the optimizer.
This can still be fast, depending on the number of partitions I have, right? If I have 50 million partitions, I mean that’s probably not the best number example to use, If I have five thousand partitions, this is going to take longer than if I have 10, obviously.
With partitioning, especially if I’m not using the partitioning key, there can come complexity.
I can create a non-partitioned, nonclustered index on a partitioned table, as we mentioned. But there are gotchas with that.
I lose a lot of the awesomeness of table partitioning if I create a nonclustered index on it.
If I create this index on just Keyword, a non-aligned index on Keyword, on my table partitioned on CustomerId, while this index is enabled, while it exists, nobody can switch a partition into the table. Because, okay, if we had a new chunk of data, based on CustomerId, and I tried to move it into the table, where does that data go in this index? It could go anywhere in this index. That’s not allowed.
If I want to do things like switch in and switch out, which we’re going to talk about a little bit more later, I have to disable or drop this index, do my partition switching, and then rebuild or recreate this index, depending on what I did.
And if this is a large table recreating this index, whether by rebuilding or creating is a real drag, and can take a very long time. And if people are trying to use the database while I do it, it may not go so well. Similarly, we now have the ability in SQL Server to truncate individual partitions. I can only do that if all of my indexes have aligned partitions, and I know, hey all of these indexes have a partition for CustomerId and they’re all exactly the same size. Alright, so I’ve got a few questions in here; I’m trying to check in on them.
A question about identity columns
Diana says: what if the primary key on a facts table is an identity value, as is often the case in dimensional models? In the case where we have an identity, the primary key on a fact table as an identity value, we want that to be unique. If we are partitioning by that column, let’s say I am partitioning by my column, that happens to be an identity, and my partitioning column is the same column as the column of my clustered index, in that case I could still make it unique, because my partitioning key is exactly equivalent to my clustering key.
If I wanted to partition on a different column though, I run into problems with guaranteeing uniqueness. I may have to create a non-aligned index to enforce that uniqueness, and then I lose the ability to do that switching, and things like that. This is often not that big a deal, in data warehouses, many times people say, okay I enforce my uniqueness in other systems I don’t necessarily enforce it here. So some people are willing to make compromises on that. It can be one of the tricky things, enforcing uniqueness on a partition table can be very tricky when your partitioning column is different than your clustering column. Yeah, it gets interesting.
Can parameterized queries get partition elimination?
Brian’s question is: let’s say we do have a query like this running against a partition table, and the difference is we have the partitioning key column in the query, but it is parametrized, so that it doesn’t see the literal value in here; we have CustomerId equals at CustomerId, can I still get partition elimination?
Yes, you can. It does work with parametrization. You don’t have to make all your queries use literal values. It will work with parametrization. Great question, cause that is one of the things where you’re like there’s lots of weirdness in this world, can I still parametrize my queries. Yes, you can. Even forced parametrization, as long as the data type matches, there is some cases where if I have a parameter, and the data type has a bad implicit conversion to the column definition, that’s also true for literal values. If there’s a data type mismatch, I may not get partition elimination, but as long as my data types match, I should be good.