Kendra Little | March 24, 2024
Nested loop join operators tend to look quite innocent in an estimated query execution plan. But life ain’t always so simple.
Microsoft documentation explains:
A nested loops join is particularly effective if the outer input is small and the inner input is preindexed and large. In many small transactions, such as those affecting only a small set of rows, index nested loops joins are superior to both merge joins and hash joins. In large queries, however, nested loops joins are often not the optimal choice.
Unfortunately, we can run into problems easily.
Problem 1: Execution plan reuse gone bad
One problem is due to execution plan reuse in the situation known as “bad parameter sniffing”. If I cache an execution plan that uses a nested loop to work on a small set of rows, then I reuse that plan to work on a large set of rows where a “bigger” join operator would be more appropriate, things can get pretty slow.
Problem 2: Non-optimized nested loops suck
A related problem that I keep running into lately is non-optimized Nested Loops. This is basically a special case of bad parameter sniffing, but it can be particularly hard to identify.
In this case, you may have a fast plan with a nested loop operator, and a “slow” plan that also has a nested loop operator. The two plans can look pretty darn identical. You can only see the difference when you look at the properties of the nested loop operator to check if it contains “Optimized = true”.
So, what’s an optimized nested loop?
Craig Freedman of the SQL Server optimizer team explains that an optimized nested loop uses best-effort sorting, and that:
…an optimized nested loops join may be thought of as a “safety net” for those cases where SQL Server chooses a nested loops join but would have done better to have chosen an alternative plan such as a full scan or a nested loops join with an explicit sort.
Aside: I once got to meet Craig Freedman in person and I was overcome like a child meeting an astronaut. I squealed loudly and was kind of speechless and it was super awkward. He was very gracious. It’s hard to meet your heroes.
Anyways, I can attest that, just like the metrics in that blog post, non-optimized nested loop plans can be hella slow.
Unfortunately, while there is a query hint, DISABLE_OPTIMIZED_NESTED_LOOP
, there is no query hint to dictate that nested loop operators SHOULD be optimized. I am not clear that there would be a big downside to a hint like this — who doesn’t want a safety net? I am going to put in a feature request to have YES_PLEASE_OPTIMIZE_MY_NESTED_LOOP
as a query hint and ideally a database option to prefer optimized nested loops.
Workarounds for problems with nested loops
I’ll write more about this all in a blog post eventually, but when I’ve got a performance problem with nested loops, I generally ask the following questions first:
- Can I just get rid of the nested loop? If the issue is a nested loop supporting a key lookup is the problems, sometimes modifying an index so it covers the query will eliminate the nested loop problem entirely.
- Do I need to allow plan reuse? Can’t have bad query plan reuse if I’m compiling fresh every time. See Erik' Darling’s post, Is RECOMPILE Really That Bad For SQL Server Query Performance?.