Opinions expressed on this site are solely those of Kendra Little of Catalyze SQL, LLC. Content policy: Short excerpts of blog posts (3 sentences) may be republished, but longer excerpts and artwork cannot be shared without explicit permission.
on February 16, 2018
I got a question last week from a very smart fellow:
How can I tell when statistics were last updated for a heap?
Before I could email him back, I soon got another email in which he answered his own question.
I’m not saying that he’s smart because he answered his own question. I’m saying he’s smart because I’ve met him before, and he’s a very insightful person and a great problem solver!
In his later email, he said he was embarrassed about the question. I guess it felt a bit basic. But it’s not a dumb question at all.
I think of this as a “framework” type question
When I learn things, I usually first accumulate a bunch of facts that I associate together loosely. When I figure out a question like this, the answer often tends to snap all those facts into a stronger framework than I had before, and I get to a deeper understanding.
That last little fact that puts your framework together often feels glaringly obvious afterward. But it certainly didn’t before!
We think of statistics as being for a table in SQL Server
Statistics are little descriptive pieces of information that help the optimizer estimate how data is distributed.
Let’s say I’m SELECTing StateCode from dbo.Address WHERE AddressId = 420101.
If dbo.Address has a unique clustered index on AddressId, it will have a statistic associated with the clustered index that helps it understand what AddressIds are present. It also understands that AddressIds are unique, and therefore there’s at most only one row for AddressId = 420101.
But what if dbo.Address was a heap?
A heap is a table without a clustered index.
If dbo.Address if dbo.Address was a heap, there wouldn’t be a nice unique clustered index with a related statistic to give it this information.
Instead, a few things could happen when this query ran:
- If there is a nonclustered index that leads on the AddressId column, it will have a statistic associated with it. If the nonclustered index is unique, then that uniqueness will be known as well.
- If there is an existing column statistic, information about the cardinality and data ranges can be used from that
- If no index or column statistic exists and the database allows column statistics to be automatically created (which is a default setting), then a little column statistic can be created.
But notice, no “heap” statistic will be created.
There are two types of statistics: index statistics and column statistics
Although we tend of thinking as statistics as being for a table (and I tend to have this thought framework as well), this isn’t really the case.
For indexes, statistics describe the data in the key columns of that index. The leading column of the index is particularly well described, as it gets a lot of detail captured about it in a part of the statistic called the histogram.
Column statistics describe individual columns.
What would a statistic be if it described a heap? A heap doesn’t have key columns. There is no “leading” column defined, it’s simply a table structure without an index.
So: when were statistics last updated for a heap?
For a heap, you can look at individual column statistics as well as index statistics and examine which each of them was last updated, with a query like this:
SELECT
stat.auto_created,
stat.name as stats_name,
sp.last_updated,
STUFF((SELECT ', ' + cols.name
FROM sys.stats_columns AS statcols
JOIN sys.columns AS cols ON
statcols.column_id=cols.column_id
AND statcols.object_id=cols.object_id
WHERE statcols.stats_id = stat.stats_id and
statcols.object_id=stat.object_id
ORDER BY statcols.stats_column_id
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)'), 1, 2, '') as stat_cols,
stat.filter_definition,
stat.is_temporary,
stat.no_recompute,
sp.modification_counter,
sp.rows,
sp.rows_sampled
FROM sys.stats as stat
CROSS APPLY sys.dm_db_stats_properties (stat.object_id, stat.stats_id) AS sp
JOIN sys.objects as so on
stat.object_id=so.object_id
JOIN sys.schemas as sc on
so.schema_id=sc.schema_id
WHERE
sc.name= 'dbo'
and so.name='Test'
ORDER BY 1, 2;
GO
Here’s what the first few columns of the results look like for my heap named dbo.Test:
If you’d like to build the dbo.Test heap and play around with it, get code to do so in this Gist.