Mmmmm, dynamic sql
Variations on the theme…
You don’t have to use Dynamic SQL to “branch” your code – you could do something very similar by creating more stored procedures, and calling them from the main procedure. The benefits and downsides are quite similar: you have a higher level of complexity to manage.
Transcript
Let’s explore some solutions with Dynamic SQL.
Some folks have an allergy to Dynamic SQL, some people love it. I am definitely in the ‘love it’ camp, but I do completely admit that the quality of your Dynamic SQL will directly influence how much you like Dynamic SQL.
It’s a very flexible tool, and like a lot of really flexible tools, that means it can be great or it means it also can be a real bear, depending on how well it’s documented, how clearly it’s written, and whether any help has been made to make it easier to debug, if it gets really complex and then you run into problems.
It can be absolutely a problem to deal with, but if you are responsible with it, it’s not always that bad.
Our first Dynamic SQL procedure we’re going to look at is starting out pretty simple
I’m going to go ahead and create it first.
What we’re doing is: we are declaring a couple of local variables. Then we are setting our @FirstNameId, querying from ref.FirstName. and then generating a Dynamic SQL query which plugs in that @FirstNameId value. We’re executing that with sp_executesql and executing it as parameterized Dynamic SQL. Parameterizing this and using the @FirstNameId parameter, I didn’t just plug it in as a literal value: this means that my Dynamic SQL will be able to reuse execution plans.
If I didn’t parameterize this and I just concatenated in the value for @FirstNameId, then every time I ran this I would get a different plan for each FirstNameId
I then would have– how has this performed over the last 10 runs? Well I’d have 10 different execution plans in my cache, they aren’t reusing one another. That’s really hard to monitor.
If I get into a habit of doing this in all of my code I have no reused plans and they’re having to be purged out of memory, and I’ve got this big mess.
So that isn’t a good practice.
Instead, I have used parameterized Dynamic SQL.
There are some exceptions to this, when you don’t want to parameterize your Dynamic SQL in specific situations, I’m not saying this is 100% always the best thing, it’s just often best to parameterize your Dynamic SQL.
I’m going to clean out my buffer pool for my whole instance, my private test instance, and let’s go ahead and run this for Matthew.
It figures out Matthew’s FirstNameId, plugs that into the Dynamic SQL, and we do get a plan that is optimized for Matthew
By passing that FirstNameId into the Dynamic SQL and saying here’s what the FirstNameId is, we were able to create a parameterized, reusable plan that was able to use the histogram.
What if it doesn’t run first for Matthew?
We are allowing execution plan reuse. I’m going to say, okay the next time you run this procedure, or the next time actually I put in the table name here, the next time any query accesses this table, any query or procedure or whatever, it needs to get a fresh plan.
By the way, {sp_recompile} does take out a high-level lock against that table. I’m in a totally private test environment so that doesn’t matter, but if you ever were to run this command against a table in a busy production environment you could potentially cause a big blocking chain. So be aware of that. I am going to then do my buffer cache trick. And then I’m going to run this for Fay.
I’m letting it cache the execution plan for Fay
Now Fay is super fast when she gets her own plan, because it’s optimized for her FirstNameId.
She gets a nice little nested loop plan, and her actual number of rows is 508. Let’s look at the key lookup here and just verify that yes, her estimate was just around 3,000, ballpark for her, she only has 508, it’s a good estimate for her. Clearing out my buffer pool, my data cache, but not clearing out my execution plan cache.
My execution plans are still there, now when I run this for Matthew he’s going to reuse Fay’s plan, because I parameterized the Dynamic SQL and allowed plan reuse.
Depending on who runs first, I’m either going to get a “little” plan in cache that has nested loops, or I’m going to get a “big” plan in cache that has a clustered index scan.
And when the little plan is in cache and this runs for Matthew, sure enough I get the 11-second run with uneven performance.
This version of Dynamic SQL, this version of the solution, there’s good things about it.
But there’s bad things about it too, because now I’ve got this parameter-sniffing situation where, depending on who gets into cache and who it’s running for, I’m going to have different performance.
We can improve on this, and do better, to get more consistent performance with our Dynamic SQL
We could go back to recompile and combine it into our Dynamic SQL
In this procedure, we’re using our same Dynamic SQL pattern as last time, but I have added into my Dynamic SQL a recompile hint, and said: optimize this query for the FirstNameId that you’re passing in. This recompile hint means: don’t reuse anyone else’s plan, and don’t let anyone else reuse my plan. Let’s clear out our execution plan cache and let’s run this for Fay. Fay is super fast. She gets her really nice little nested loop plan, that’s terrific. Clear out our cache, and run this for Matthew.
And Matthew, hey it’s compiling fresh for his FirstNameId. He gets his four-second run, which is fast for Matthew, and he got his clustered index scan with that estimate of 1.45 million rows.
This is why recompile hints can sneak into so much production code. Because everybody gets their own plan.
With certain ways that we write our SQL, sometimes that can give us great performance, but once we start letting this into our code, we start getting this into more and more code, it becomes very difficult to monitor, because it’s not staying in our cache for long. If we have Query Store enabled on 2016 and higher that gets easier, but we’re burning CPU for all those compiles.
Eventually this habit catches up with us, so I avoid this if I can have a better way.
And there is a better way!
We have another option for Dynamic SQL, which is to branch our Dynamic SQL
Still use a parameterized Dynamic SQL, but branch it off and coerce SQL Server into giving us a “big” plan and a “small: plan.
I’m going to go ahead and compile this stored procedure. This one is NameCountByGender_DSQL_OPTIMIZEFOR. And in this, I’ve added a variable at the top.
I’ve added a local variable named @TotalNameCount. My first query now doesn’t just figure out what the FirstNameId is, it also figures out what the TotalNameCount is.
In our case, TotalNameCount is just a ballpark estimate. In ref.FirstName, TotalNameCount represents the total names across all the time we have history for, which is from the 1880s to 2015. Our query is looking at a subset of that data, from 1966 to 2015. But this ballpark is good enough.
In some cases you might not have a column like this to help you. In some cases you might just have a couple of things that are outliers. And you might just say, for these specific values, I want to do this, for other names I want to do a second query.
There’s different ways that you might branch this, we just happen to have a TotalNameCount column. What I’m saying here is, when the TotalNameCount is over one million, I want to set the Dynamic SQL to this query, which has got an OPTIMIZE FOR hint in there. This OPTIMIZE FOR query hint tells SQL Server, I want you to optimize this query plan for a specific FirstNameId.
In this case, I picked Matthew as the representative of the popular names. If there aren’t more than a million of these, I want you to use this different query syntax which has an OPTIMIZE FOR hint in it that says I want you to optimize this for the FirstNameId of Fay’s FirstNameId. She’s the representative of the more unique names.
We then are still using the parameterized Dynamic SQL, so we can have execution plan reuse. This means I’ll have two different execution plans at most, they’ll be able to be reused, so I won’t burn a ton of CPU compiling them, I won’t pollute my execution plan cache.
But I can still have a plan for “I’ve got a lot of names,” and a plan for “I don’t have a lot of names.”
They are not tied in to any specific index. I haven’t forced scanning, I haven’t forced a specific index, so if indexes are improved on our table they might both adapt and start essentially doing the same thing with a better index and just be two execution plans, but that’s not too terrible, right?
Let’s make sure I created this, I talked so much I couldn’t remember. We’ll clear out our data cache. And we’ll go ahead and we’ll run this first for Fay.
Now Fay comes back really fast. There’s the first query that is figuring out the TotalNameCount. I don’t have the perfect index for that, so it got a little more complicated. Then we have a nice, nested loop query, that’s the small plan for Fay, which works great for her. If I click on this little dot dot dot next to my DSQL, I can even see that this is the Dynamic SQL that was optimized for Fay for the less frequent names.
I’m going to clear out my cache again and I’m going to run this for Matthew. Mow Matthew should hit that branch where there’s more than a million names for Matthew. I didn’t force a recompile on this, but it hit that branched code, and sure enough the second query for Matthew is doing– let’s see if we can a little more real estate on the screen– it’s doing that clustered index scan. My plans take up too much room {on the screen}.
It’s doing that clustered index scan and we have the estimated number of rows, the familiar 1.45 million. Clicking on the three dots for the query I can see that it used the branch of the Dynamic SQL that optimizes for the more popular names.
This can be really flexible.
The main downside of this is…
Dynamic SQL can make troubleshooting and maintaining code trickier. We’ve added complexity.
Also, in some cases, you may not have a reliable branching criterion like we have.
It may be that you write a database and you’ve got different clients using different versions of this database, and they have really different data distribution and different values in these tables. In that case, doing this Dynamic SQL fix may not be meaningful at all. Because we get into trouble if we optimize for a FirstNameId that doesn’t exist.
There’s certain scenarios where this might be a great fix, and then other scenarios where this particular one is actually not that good
In those other scenarios, maybe a forcescan hint is the best choice, or maybe, in that scenario, maybe actually changing the index is the best choice too, right?
We’ve got different constraints on different environments.
I do like this Dynamic SQL a lot, but if I’m going to go with Dynamic SQL I may need to add some complexity to it to make sure that I’m not compiling things too often, I’m getting the right amount of plan reuse for my scenario, and also consistent performance for all of the different ways that the query might run. So this can be a great solution.
For this particular problem, I don’t think there is a one-size-fits-all solution, which is why I like this scenario so much!