Query Hash Values are Meaningless in SQL Server: They May be Reset to be the Same Value as the Query Plan Hash

By Kendra Little on November 24, 2024

This is the worst bug I’ve found in SQL Server to date. Previously, my top find was SQL Server Online Index Rebuild sometimes happens offline without warning. This one has taken top slot because it makes my life more difficult on a daily basis.

Background: SQL Server generates a query_hash for each query. This is stored in sys.query_store_query and it’s one of the primary ways you can identify what a query is across different Query Stores, or even the same Query Store over time, as surrogate query_id values get reset if Query Store is cleared or data ages on. The query_hash is a “Zobrist hash over the shape of the individual query, based on the bound (input) logical query tree. Query hints aren’t included as part of the hash.” (Source)

Except that’s wrong. These days it’s NOT always a hash of the query. If you force query plans, or if you use something like Automatic Plan Correction which forces plans, frequently the query_hash value will be bizarrely overwritten by a query_plan_hash value. So sometimes the query_hash ends up being the “MD5 hash of the individual plan.” (Source) This occurs first in the query_plan XML, then it weirdly invades the Dynamic Management Views in Query Store for all query_hash values for that query_id.

This presents a big challenge identifying queries in Query Store, or writing reliable diagnotic tooling and automation. I’ve reported this, but Microsoft hasn’t acknowledged that it’s a bug yet. At this point they are still asking me if there’s a pattern in how my TSQL is written that causes the query_hash value to be the same as the query_plan_hash.

Dear reader, if my TSQL was that powerful I would be doing MUCH wilder things than stunts with query_hash values.

This post shows a repro of the problem and just how unreliable the query_hash has become in SQL Server 2022 and Azure SQL Managed Instance. It may happen in additional versions, I haven’t tested more broadly than this.

Set up our environment to reproduce this

I’m using a copy of the StackOverflow2013 database to reproduce this. I’m currently running Developer Edition with @@version Microsoft SQL Server 2022 (RTM-CU15-GDR) (KB5046862) - 16.0.4155.4 (X64).

To get started, I reset basic configuration, make sure Query Store is enabled, clear Query Store, set database compat level to 160, and make sure Automatic Plan Correction is off.

Warning: Automatic Plan Correction automatically forces query plans and it causes this bug to spread rapidly. Especially avoid on relying on query_hash to identify queries if you have that feature enabled. However, the bug occurs with plan forcing even if you’re manually forcing plans, so I’m reproducing that in this post.

USE [master]
GO
exec sp_configure 'show advanced options', 1;
GO
RECONFIGURE
exec sp_configure 'max degree of parallelism', 8;
GO
exec sp_configure 'cost threshold for parallelism', 50;
GO
RECONFIGURE
GO
ALTER DATABASE [StackOverflow2013] SET QUERY_STORE = ON
GO
ALTER DATABASE [StackOverflow2013] SET QUERY_STORE (
    OPERATION_MODE = READ_WRITE, 
    MAX_STORAGE_SIZE_MB = 4000,
    CLEANUP_POLICY = (STALE_QUERY_THRESHOLD_DAYS = 10))
GO
ALTER DATABASE [StackOverflow2013] SET COMPATIBILITY_LEVEL = 160
GO

USE StackOverflow2013;
GO
ALTER DATABASE CURRENT SET QUERY_STORE CLEAR;
GO
ALTER DATABASE CURRENT
SET AUTOMATIC_TUNING ( FORCE_LAST_GOOD_PLAN = OFF );
GO

Create indexes and procedure for a parameter sniffing situation

Next, I create two indexes and a stored procedure. This is not meant to be beautiful code, it’s code I wrote for a demo on problems that can arise because Automatic Plan Correction doesn’t exclude temp tables, but it works to reproduce this issue as well.

EXEC dbo.DropIndexes;
GO

CREATE NONCLUSTERED INDEX ix_Posts_OwnerUserId_INCLUDES
on dbo.Posts (OwnerUserId) INCLUDE (Id);
GO
CREATE NONCLUSTERED INDEX ix_Users_DisplayName_INCLUDES
on dbo.Users (DisplayName) INCLUDE (Id);
GO

CREATE OR ALTER PROCEDURE dbo.PostsForDisplayname
    @DisplayName NVARCHAR(40)
AS
BEGIN
    SET XACT_ABORT, NOCOUNT ON;
    SET @DisplayName=@DisplayName + '%'

    SELECT u.Id as UserId
    INTO #users
    FROM dbo.Users as u
    WHERE u.DisplayName like @DisplayName;

    /* This query selects from the #users temp table
    That means it can use statistics on the temp table to recompile if the table
    has a lot more data in it than in the cached plan */
    SELECT top 10 p.*
    INTO #bin
    FROM dbo.Posts as p 
    JOIN #users as u on p.OwnerUserId = u.UserId
    ORDER BY Score DESC;

END
GO

Run the procedure and get two plans in cache for the second query

We run the following SQL:

exec dbo.PostsForDisplayname 'Clothilde';
GO 
exec dbo.PostsForDisplayname 'Jon';
GO

There are no posts by users named ‘Clothilde%’, but there are a LOT of posts by users named ‘Jon%’, so a recompile naturally occurs due to statistics on the temporary table.

Run the following query to review the hashes from both the Query Store DMVs and extract the same values from the query plan XML:

SELECT 
    q.query_id,  
    p.plan_id,
    qt.query_sql_text, 
    CAST(p.query_plan AS XML) AS QueryPlan,
    q.query_hash as QueryHashFromSysQueryStoreQuery, 
    p.query_plan_hash as QueryPlanHashFromSysQueryStorePlan, 
    CAST(p.query_plan AS XML).value('declare namespace sp="http://schemas.microsoft.com/sqlserver/2004/07/showplan"; (/sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple/@QueryHash)[1]', 'varchar(64)') AS QueryHashExtractedFromPlanXML,
    CAST(p.query_plan AS XML).value('declare namespace sp="http://schemas.microsoft.com/sqlserver/2004/07/showplan"; (/sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple/@QueryPlanHash)[1]', 'varchar(64)') AS QueryPlanHashExtractedFromPlanXML,
    CAST(p.query_plan AS XML).value('declare namespace sp="http://schemas.microsoft.com/sqlserver/2004/07/showplan"; (/sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple/sp:QueryPlan/@UsePlan)[1]', 'int') AS IsUsePlanExtractedFromPlanXML
FROM sys.query_store_query AS q
JOIN sys.query_store_query_text AS qt 
    ON q.query_text_id = qt.query_text_id
JOIN sys.query_store_plan AS p 
    ON q.query_id = p.query_id
WHERE q.object_id = OBJECT_ID('dbo.PostsForDisplayname', 'P');
GO

We care about the second query in the procedure here, which has been assigned query_id 7. Query_id 7 has two different query plans, which have plan_ids 9 and 10. Here are the hash values we currently have (extracted from the image):

Query Id Plan Id Query Hash sys.query_store_query Query Plan Hash sys.query_store_plan Query Hash in Plan XML Query Plan Hash in Plan XML
7 9 0xD55EEC4C7A81E644 0x44A186BFC52AA053 0xD55EEC4C7A81E644 0x44A186BFC52AA053
7 10 0xD55EEC4C7A81E644 0x56049465CF9FFB95 0xD55EEC4C7A81E644 0x56049465CF9FFB95

This all looks normal and good:

  • Query_id 7 has a query_hash that is the same in sys.query_store_query as it is in the query plans.
  • The query_plan_hash is a different value because it’s a hash of the query execution plan, not the query.
  • The query_hash values in sys.query_store_plan match the values in the query plan XML.

Force a query plan, then rerun the stored procedure for a new value

Next, we’re going to force the “small” query plan for this– the one we ran for Clothilde which had very few rows in the temp table.

After forcing that plan, we run the procedure for the value J, which is going to use that forced plan but will find a whole lotta rows. Run that twice:

exec sys.sp_query_store_force_plan @query_id=7, @plan_id=9;
GO

exec dbo.PostsForDisplayname 'J';
GO 2

Something’s gone weird in the query_hash value in our execution plan

Rerun the query above retrieving the query_hash and query_plan_hash values from the DMVs and the query plans.

We have a new query plan, and something is starting to look weird:

Here are the values for query_id 7 extracted from the image:

Query Id Plan Id Query Hash sys.query_store_query Query Plan Hash sys.query_store_plan Query Hash in Plan XML Query Plan Hash in Plan XML Is Use Plan Hint in Plan?
7 9 0xD55EEC4C7A81E644 0x44A186BFC52AA053 0xD55EEC4C7A81E644 0x44A186BFC52AA053
7 10 0xD55EEC4C7A81E644 0x56049465CF9FFB95 0xD55EEC4C7A81E644 0x56049465CF9FFB95
7 11 0xD55EEC4C7A81E644 0x61807D991A7B6914 0x61807D991A7B6914 0x61807D991A7B6914 Yes

SQL Server saw that we had forced a Query Plan. It also recognized that the row estimates in the forced query plan were way lower than the estimates for the currently executing query, so it created a morally equivalent plan which has the same shape as the forced plan, but larger row estimates.

That’s cool, but this weird bug showed up, too:

  • The “morally equivalent plan” XML has the wrong query_hash value in it. It’s stored the query_plan_hash as the query_hash, too!
  • The query_plan_hash in the execution plan for plan_id = 11 does not match the query_hash in sys.query_store_query.

Here’s what this looks like when viewed in the query plan properties in SSMS:

It gets worse: further executions make the query_hash incorrect for every row with this query_id in sys.query_store_query

Now run the stored procedure for another value:

exec dbo.PostsForDisplayname 'Kendra'
GO 

Rerunning the same query to check query_hash values in the dynamic management views shows that we have a new “morally equivalent plan” added.

And the Query Store DMVs have now become infected with that messed up query_hash value.

I’ve extracted the data for query_id 7 to text, which is:

Query Id Plan Id Query Hash sys.query_store_query Query Plan Hash sys.query_store_plan Query Hash in Plan XML Query Plan Hash in Plan XML Is Use Plan Hint in Plan?
7 9 0x61807D991A7B6914 0x44A186BFC52AA053 0xD55EEC4C7A81E644 0x44A186BFC52AA053
7 10 0x61807D991A7B6914 0x56049465CF9FFB95 0xD55EEC4C7A81E644 0x56049465CF9FFB95
7 11 0x61807D991A7B6914 0x61807D991A7B6914 0x61807D991A7B6914 0x61807D991A7B6914 Yes
7 12 0x61807D991A7B6914 0x868D1CD7976DCFF8 0x868D1CD7976DCFF8 0x868D1CD7976DCFF8 Yes

Now we are in a state where:

  • One “morally equivalent” query plan has erroneously set its query_hash value to the query_plan_hash value in the plan XML
  • Another “morally eqivalent” plan being generated has caused the query_hash value in sys.query_store_query to be bizarrely updated to the value of the query_plan_hash for one of the plans.
  • We have three query plans where the query_hash value in the plan XML does not match the query_hash value in sys.query_store_query
  • If we have any tooling that has recorded the previous query_hash value of 0xD55EEC4C7A81E644 for this query, it (or you) will no longer find that in sys.query_store_query.

Update: I’m not the first to find this bug, and I probably won’t be the last

I was inspired to write this post after Haripriya Naidu asked at PASS Data Community Summit if anyone else had seen this behavior, and I had – I’d just been so burnt out on filing bugs with Microsoft that I hadn’t found the energy to document and report it. (It’s a lousy customer experience.) But the fact that Haripriya asked made me feel like it was worth it.

Paul White has written a great post digging into this, SQL Server Forced Plans Overwite the Query Hash. Paul shows a much simpler way to reproduce this than I have here, simply with a use plan hint. (I do think the repro in this post still adds some value by showing how the issue spreads in Query Store.)

Paul writes:

This bug affects SQL Server 2008 to 2022 CU16 inclusive.

He clarifies in a note:

Only guided plan search is affected. Supplying individual query hints (except USE PLAN of course) in a plan guide (or via an equivalent feature) does not provoke this misbehaviour.

Paul also references that this problem appeared in posts by SQL Sasquatch, aka @sql_handle back in 2019.

Tomáš Zíka also posted about this.

I have chatted with a Product Managers at Microsoft in a LinkedIn DM about this who works with the teams in this area. They were receptive to hearing about this issue– and they even started looking into it during their weekend. I do expect that working through issues like this can take some time for a product like SQL Server, and I’m happy that the first steps of finding the right people to communicate the issue to and raising awareness of the bug within the user community seem to be going well.

What this means if you force query plans

I’m hoping Microsoft recognizes this a bug and fixes the issue, because this makes using Query Store much harder for administrators. I think, most critically, it makes learning to use Query Store harder.

First of all, it makes no sense to me that the hashes of two different things (query text and a query plan) would regularly be the same value. Holy hash collision, Batman. It just is confusing.

But this also means:

  • You can’t reliably use the query_hash value from an execution plan that you find with a tool like sp_WhoIsActive or from a monitoring tool to find the query in sys.query_store_query.
    • Using the query_plan_hash and looking it up in sys.query_store_plan should be more reliable, but you have to know to do that.
  • If you have a single tenant architecture (multiple copies of the same database schema in production), you can’t reliably find the same query across different query stores using the query_hash value.
    • If query plans have been forced in some places, the query_hash could be set to who knows what, and you’ll incorrectly think the query isn’t running there.
    • You can’t use query_plan_hash for this, either, because possibly the query has totally different plans.
    • last_compile_batch_sql_handle in sys.query_store_query is potentially the best bet at this point. But that’s for the entire batch. Maybe statement_sql_handle in sys.query_store_query_text is the best bet, because that value is also in query plan xml.
  • The same thing also goes for queries that you’ve saved off in the past and want to examine now, but Query Store data has changed.