If it’s in tempdb, it’s probably weird
This gets pretty complicated when you dig into it
I only give a brief overview of the weirdness of statistics, temp tables, and object caching in this video.
If you want to dig into more of the details: start with Paul White’s article here: https://sqlperformance.com/2017/05/sql-performance/sql-server-temporary-object-caching,
If you’re OK with the summary for now, that’s fine too! The main thing to remember is that temp tables don’t have pass to free performance and that they have a high level of complexity – so they’re not a magic replacement for local variables.
Transcript
Here’s a fun one. This one I think is really creative. It’s so creative that I’ve named it NameCountByGender_Creative.
Just because it’s creative doesn’t mean it’s rare
I have found this in the real world and I have had people email me about this. It creeps up at the funniest of times.
I mentioned that using a local variable has our anonymity problem unless we use that recompile hint, and there’s all sorts of reasons we might avoid recompile.
The creative mind might say: what if I don’t use a local variable? What if I use a temp table?
In this case, we aren’t declaring a local variable, we are using a little query to grab that FirstNameId and we’re throwing it into a temp table named #FirstName. Then we are joining to our temp table.
We are specifically using a temp table because temp tables do support column statistics, and statistics like that statistic on FirstNameId can be meaningful to our query. I’m going to create our creative procedure and make sure to nuke my clean buffers on my whole instance,
We’re testing from cold cache.
Now I’m going to go ahead and run this for Matthew. We’re running our creative procedure for Matthew and it doesn’t take super long. We finished in six seconds there.
Looking at our execution plan, as expected, we have a first query populating our temp table. then we have a second query down here that is seemingly doing the right thing. Let’s look at it, actually. First, let’s look at how long it took. The elapsed time on there is close to seven seconds. It did use parallelism, and it did do a clustered index scan against the table. It didn’t do the nested loop lookup for Matthew.
How many rows did it think it was going to get? Well, the estimated number of rows, it was able to use the statistics and the estimated number of rows is 1.45 million.
So, why am I saying this is creative and weird?
Well, it’s weirder than it might seem. What many of us assume about temp tables is that, as this procedure is run across different sessions, I ran this now, someone else runs it for a different first name in five minutes…
We assume that their temp table is private to them. But temp tables are special in SQL Server. SQL Server doesn’t want to keep reallocating and reallocating temp tables, so it has ways that it can not reuse the data in the temp table, but not have to allocate a whole new temp table.
It can use cached temp tables.
I’m not clearing out even my execution plans. I’m just clearing out my buffer pool, my data cache, and my database, so I’m running from cold cache again.
Now, I’m going to run the same creative stored procedure for Fay. Fay’s stored procedure, we might expect for her to get the nested loop plan, but she doesn’t finish in zero seconds. When she gets the nested loop plan, she finishes really fast. Instead, yes, it does run the first query again to find her FirstNameId, but in this second query here, we still have a clustered index scan.
If it had looked at statistics on the temp table for Fay’s FirstNameId, it would have figured out, hey, she doesn’t have a lot of names. We’ve seen that in action before. The number of rows expected for the FirstNameId in this case is the one after the hash match and this is where we’ve got, hey, it thinks we’re still dealing with the 1.45 millions rows for Matthew.
This is because of these oddities of temp tables
If SQL Server is doing something like this and it’s able to reuse the cached temp table, unless you kind of go crazy and do a lot of hints, it actually can reuse the allocated temp table object and there are statistics on that object that it reuses.
It is looking at the statistics for the plan that was created when I ran this for Matthew and I’ll prove it by running it the opposite way.
I’m going to clear out my buffer pool again. So, our data cache, our clean buffers have been dropped, and then, I’m going to run sp_recompile against NameCountByGender_Creative.
I am saying the next time you run this stored procedure, you have to generate a new plan. I’m going to run it for Fay and we’re going to leave her plan in cache. I said we have to compile on the next run a fresh plan. Here we go.
Fay runs and, wow, she’s fast. Zero seconds. Good job, Fay. Looking at our execution plan, here is our first query populating the temp table for Fay and here, down here, is the plan that ran using the temp table. It decided to do a nested loop. The estimated of rows is not 1.45 million. It estimated almost 3,000, which is, for Fay, it’s a ballpark estimate. It’s close enough to 500. Within range, that’s a good estimate there.
So, once we forced a recompile and I said, oh, you have to come up wit a new plan, in this case, that did generate a new plan.
Now, I’m going to clear the buffer pool again, but I didn’t force a recompile on the plan. I’m going to run this for Matthew. Matthew is reusing the execution plan created for Fay and repopulating the temp table, it’s going to reuse those statistics on the temp table and it, once again, this time, reused the plan, the plan cached for Fay was the nested loop plan with the 3,000 estimate in there.
It’s still got plan reuse issues
I avoid this pattern when possible because it’s really weird to troubleshoot.
What if this was running on two sessions concurrently? Well, then it wouldn’t be reusing the allocated object with the statistics. It’s just too unpredictable.
To be clear, I’m not saying that I don’t use temp tables. I do use temp tables when I need them, and I can make them consistent and they help performance. What I’m saying I don’t do is:
I don’t use a temp table as a fake variable
That’s what this pattern is that I’m showing you here: “oh, well, variables are anonymous, so I will just turn a temp table into a local variable.”
No, that has its own baggage, as well! And that is the pattern that I would avoid.
I think we have better ways to solve this. This creative way just seems like a good way at first.