Venturing out of hints, and into wilder TSQL
We might use a hint to stabilize this query short term.
Let’s look at more options for that long term rewrite, just for the performance tuning fun of it.
Links discussed in the video
Adam Machanic’s SQLBits presentation on Query Tuning Mastery: Clash of the Row Goals
Paul White’s article on The “Segment Top” Query Optimization
Transcript
Let’s have some fun with TSQL, and see if there are other rewrites I could use different from that MAX group by rewrite, where I could get fast performance for this.
Now, it’s nothing against my MAX group by rewrite, this is just for the fun of seeing, is there more than one way to do it?
And, also just to show that there are more ways you can boss around the SQL Server optimizer using TSQL rewrites that don’t necessarily involves hints. You can still heavily influence the choices that are made by how you write your TSQL.
Well, when we were talking about row goals with the fast hint, one of the problems with that hint right is, a query hint has to go at the end of the query, and it sets that row goal for everything.
You can also set row goals with the top operator
I might try to do something like this, and say, inside my NameRank CTE, I want to set a row goal in there, now if I’m going to set the row goal for the number of rows that are come back, I need to also do my ranking in there, my filtering on my ranking. Because, there are 10,903 rows where rank equals number one.
There’s a couple really big gotchas with this. One of them is, wherever I use the top operator, I’m saying I want no more rows then that back. So if my data changes, is my query going to be right anymore?
But also, this approach I’m trying to put this windowing function here, if I try to run this query, it is going to fail because windowing functions can’t go in a where predicate. Windowing functions can only be in select or order by, so I can’t do this and just say where the rank equals number one.
I could try to do it a different way though. I could say, okay well I’ve got these risks, but I’m going to try to get around them.
Now, I’ve got my NameRank CTE here. Ranking with my windowing function. But, I add a second CTE named row goal. Row goal queries from name rank, and it says okay, I am going to put my predicate where the rows are ranked number one in this CTE, and also I’m going to set a row goal with the top operator.
Now I’ve hedged my bets this time and said maybe I’ll have up to two times the data. This could obviously still go wrong at one point, but I’m setting my row goal to the number of rows that I expect to be ranked number one times two. So I’ve given myself a little wiggle room here. Let’s go ahead and run this query and see how it does.
If you enjoy this idea of setting row goals with tricks like this and rewrites, you might enjoy a presentation by Adam Machanic, by the way, on Query Tuning Mastery, Clash of the Row Goals. I watched a recording of that presentation from a past SQL Bits conference, and it’s great fun and very interesting approach to query tuning.
Now with in my case I don’t get a terrible result from this, I get an elapse time of over five seconds, certainly not the worst we’ve seen, and looking at our execution plan, if I go over here and I look at how many rows did you think were going to come out of the filter operator, my estimate there is in fact, two times 10,903.
I did, by putting that top operator in there, influence my execution plan, and sort of localize the row goal there.
And I got my nested loop plan that doesn’t perform terribly, but you know, there’s somethings that I just didn’t change about this query. My rewrite, we’re still doing an index scan, applying the windowing function to all the rows, and then applying a filter. Right? That’s a lot of work to do. We’re ranking everything, and then filtering out rank equals one.
We aren’t pushing down that filter at all, by just setting that row goal there
So in our case, is there something else we could do with a rewrite that might push it down? Well, there’s interesting ways that we, since we’re looking at the number one rank, there are interesting ways we could do this.
We are allowing for ties, but we could do something else with the top operator. And this is just to simplify a query. What I’m saying now is, I want to select from agg.FirstNameByYearWide, and in this case I want to order by the windowing function. Put my partition by, and my order by descending into the order by clause.
So order by those, put all my windowing function down there, and then just select top one with ties with this order by in there. So let’s go ahead and run this query and get our rows back here.
We won’t get all the columns back, we’re just pulling back the ID column. We do get back all 10,903 rows, but looking at our execution plan, this rewrite doesn’t change that pattern fundamentally either.
Now, I have a sort operator instead of a filter, because my query does, the windowing function in the order by. And then just does top one with ties, so I have a sort and a top operator but I’m still applying the windowing function to all the rows before I bring them back. So, That’s you know, in my case, I want to see if I can push it down.
Well, let’s try a different rewrite - GroupityGroup
This approach was inspired by a Paul White article. He has some great articles. This one is over on SQLblog.com.
And in this case what I’m doing is, I am using a distinct to find my groups.
I have a CTE called GroupityGroup, where I’m finding my distinct combinations of report your state code and gender. And then what I do with GroupityGroup is I use a cross apply, for every combination of report your state code and gender that comes back from GroupityGroup. I want you to go run this query. What this query’s doing, is select top one with ties, from agg.FirstNameByYearStateWide for that given report your state code and gender combo and then we’ve got a nice simple order by name count descending.
So I’m saying for every distinct group of report your state code and gender, go find the top one row with ties with the highest name account. I am pulling back all of the column from agg.FirstNameByYearStateWide because I have all those reporting columns. And then, I’m also joining to ref.FirstName.
So let’s see how our GroupityGroup approach goes that does a top one with ties, slash order by approach on this. Well, this is not the fastest horse in the stable here.
But, let’s look at what happened with our plan. We may learn something interesting. We have an elapsed time of over 10 seconds, not the worst we’ve seen, right? And there is something interesting here, it did choose to use our columnstore index here. It chose to use the columnstore index and then a hash match aggregate operator. And then it goes into a nested loop operator here.
Under the nested loop, we have another nested loop. And, what we’re doing here, is we’re doing, at the bottom of this, we’re doing, this key look up into the clustered index to pull back all those reporting columns. Well how many times did you have to actually execute this guy? Let’s scroll through here and see.
Our number of executions here was 21,715. And we ended up, you know, kind of executing that a lot. To, in this whole block of our plan, have to do all of these look ups and pull all the data out through that nested loop join, and pull it back.
Well, what if I wanted to simplify this plan? Or see if simplifying it might help?
We can test just the heart of the plan, what if I check and see, the GroupityGroup approach, same GroupityGroup here, distinct report your state code and gender. And now what I’ve changed is, instead of pulling back all those reporting columns in the middle of the cross apply, what if every time I tell it to go in there, what if I just tell it to find the ID of the row or rows, because we allow ties with that, you know, top one, ordered by name count descending.
How long does that take? And when I have these sort of really wide reporting tables, or I have to pull back a lot of columns, testing, okay what is, you know, what is the heart of it that actually does the calculation? How long does that actually take? Can be really informative.
Well, the heart of this query, is really fast. That only took 154 milliseconds.
I’ve got my columnstore index scan feeding into a hash match and then, just a nested loop and an index seek. Okay, well what if I rewrite you and instead of pulling back my reporting columns here, I add an extra join, and just say okay, we’ll join back here, again, to agg.FirstNameByYearStateWide to pick up those reporting columns.
So I’m accessing, I have agg.FirstNameByYearStateWide now referenced here in GroupityGroup, where I found those distinct combinations. I then have it also in a cross apply, where I’m finding the top row or rows, because I allow for ties, based on name count descending for that group, and then I’m joining again to get my reporting columns.
I’m also joining to ref.FirstName. So let’s see, how this approach works. Well, this approach does work better than the other one. I have a 762 millisecond query time, and I have changed the shape of my plan. I don’t have nested nested loops anymore. I have one nested loop here that finds the top one with ties based on name count descending.
And then I go back in, pick up all my reporting columns, and then also go to ref.FirstName, to pick up the first name ID.
And this different shape of the plan, that I influenced with my rewrite, does in fact, run nice and fast, and comes in under a second. Well, huh. Let’s actually look too, by the way, and look at the properties of our columnstore index scan.
What is going on in our columnstore index scan?
We do have batch mode, but in this version of the query, we don’t, you know, we are getting really fast run time here but we don’t actually have any of that magic aggregate predicate push down showing, you know, showing there, it’s just leveraging the power of being able to suck large amounts of data quickly out of that columnstore index with batch mode.
Well, what if I don’t let it use columnstore? I’ve got now my same query, but I’ve added on a hint to say, how do you do without non-clustered columnstore? I run my query, and I’ve got a 1.6 second run time, and looking at my plan, I’ve got an index scan, feeding into a stream aggregate operator.
Not too shabby for my admittedly, quite convoluted rewrite.
Well, even though I do know, at least that I am getting the right amount of rows, does it look like the data’s actually correct?
Let’s take our query, our rewritten query here. It does make me laugh. And I’m dumping the results into a temp table named distinct apply rewrite. So let’s actually select the whole query here, dump all the results into distinct apply rewrites temp table and what I already did before, is I already dumped the results from my original 13 second query into a temp table named original query, and I’ll show you a view of it, it’s got the 10,903 rows. And I can do some comparisons.
Show me if there’s any rows in original query that don’t have matching rows in, or in distinct apply rewrite. And, there’s no rows that qualify. And now, show me if there’s any rows in distinct apply rewrite that don’t have matching rows in original query. No rows returned, so it is looking like, complicated though it is, there are more than one ways to rewrite this guy, and just the bigger message here is, hints give you sort of a blunt tool, to change the way optimization happens.
If you take the time to do in depth query rewrites, you can fine tune it and control it in different areas, with much more precision. In both cases, you are really, sort of elbowing the optimizer, and saying hey, you know, I do it this way, right? I’m influencing it in both ways, there’s absolutely benefits and risks to each approach, and I always do have to test carefully but there is more than one way to boss the optimizer around.