on December 8, 2016
Short answer: the SQL Server optimizer will know that the table was truncated, but statistics might not update when you expect.
For the long answer, let’s walk through an example using the WideWorldImporters sample database.
I’ll be using Trace Flag 3604 and 2363 to get SQL Server to print information about how it optimized my query out to the messages tab. (Thanks to Paul White for blogging about this trace flag.)
First, a fresh restore of WideWorldImporters
USE master;
GO
IF DB_ID('WideWorldImporters') IS NOT NULL
ALTER DATABASE WideWorldImporters SET OFFLINE WITH ROLLBACK IMMEDIATE
RESTORE DATABASE WideWorldImporters FROM DISK=
'S:MSSQLBackupWideWorldImporters-Full.bak'
WITH REPLACE
GO
USE WideWorldImporters;
GO
Before we do anything, what do the statistics look like on Sales.OrderLines?
Here’s the query that I’m using to inspect the statistics:
SELECT
sp.last_updated,
stat.name as stats_name,
STUFF((SELECT ', ' + cols.name
FROM sys.stats_columns AS statcols
JOIN sys.columns AS cols ON
statcols.column_id=cols.column_id
AND statcols.object_id=cols.object_id
WHERE statcols.stats_id = stat.stats_id and
statcols.object_id=stat.object_id
ORDER BY statcols.stats_column_id
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)'), 1, 2, '') as stat_cols,
sp.modification_counter,
sp.rows,
sp.rows_sampled
FROM sys.stats as stat
CROSS APPLY sys.dm_db_stats_properties (stat.object_id, stat.stats_id) AS sp
JOIN sys.objects as so on
stat.object_id=so.object_id
JOIN sys.schemas as sc on
so.schema_id=sc.schema_id
WHERE
sc.name= 'Sales'
and so.name='OrderLines'
ORDER BY 1 DESC
GO
Statistics were last updated on June 2, 2016. We’ll be mostly looking at the statistic on Quantity throughout the example, so I’ve highlighted it:
Let’s run a query that loads the statistic on Quantity
Before we truncate the table, let’s take a peek into how SQL Server optimizes a query that cares about rows in Sales.OrderLines with Quantity > 10. I’m using trace flags 3604 and 2363 to make SQL Server print information about how it used statistics to optimize this to my messages tab.
SELECT *
FROM Sales.OrderLines
WHERE Quantity > 10
OPTION
(
QUERYTRACEON 3604,
QUERYTRACEON 2363,
RECOMPILE
)
GO
Here’s the info on the messages tab:
Begin selectivity computation
Input tree:
LogOp_Select
CStCollBaseTable(ID=1, CARD=231412 TBL: Sales.OrderLines)
ScaOp_Comp x_cmpGt
ScaOp_Identifier QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=10)
Plan for computation:
CSelCalcColumnInInterval
Column: QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity
Loaded histogram for column QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity from stats with id 7
Selectivity: 0.44231
Stats collection generated:
CStCollFilter(ID=2, CARD=102356)
CStCollBaseTable(ID=1, CARD=231412 TBL: Sales.OrderLines)
End selectivity computation
Estimating distinct count in utility function
Input stats collection:
CStCollFilter(ID=2, CARD=102356)
CStCollBaseTable(ID=1, CARD=231412 TBL: Sales.OrderLines)
Columns to distinct on:QCOL: [WideWorldImporters].[Sales].[OrderLines].OrderLineID
Plan for computation:
CDVCPlanUniqueKey
Result of computation: 102356
(102035 row(s) affected)
Highlights: one of the first thing SQL thinks about is the number of rows in the table
Right at the beginning, we see: “CStCollBaseTable(ID=1, CARD=231412 TBL: Sales.OrderLines)”
That ‘CARD’ number is the optimizer thinking about how many rows are in this table. If you glance back up at the table statistics, the most recent statistic to be updated was on the ‘LastEditedWhen’ column. When that statistic was updated, there were 231,412 rows in the table.
SQL Server decides that it wants detail on the Quantity column to figure out how to run this query, so we see that it loads that statistic up to use: “Loaded histogram for column QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity from stats with id 7”
Let’s truncate this table
I wipe out all the rows with this command:
TRUNCATE TABLE Sales.OrderLines;
GO
Now, I wouldn’t expect truncating the table to automatically update the statistics.
SQL Server updates statistics when they’re used to optimize a query – so if nobody queries this table for six months, I wouldn’t expect the stats to update for six months.
Let’s re-run our query, trace flags and all:
SELECT *
FROM Sales.OrderLines
WHERE Quantity > 10
OPTION
(
QUERYTRACEON 3604,
QUERYTRACEON 2363,
RECOMPILE
)
GO
The messages tab has less info this time- it’s much more concise!
Begin selectivity computation
Input tree:
LogOp_Select
CStCollBaseTable(ID=1, CARD=1 TBL: Sales.OrderLines)
ScaOp_Comp x_cmpGt
ScaOp_Identifier QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=10)
Plan for computation:
CSelCalcFixedFilter (0.3)
Selectivity: 0.3
Stats collection generated:
CStCollFilter(ID=2, CARD=1)
CStCollBaseTable(ID=1, CARD=1 TBL: Sales.OrderLines)
End selectivity computation
(0 row(s) affected)
SQL Server knows that we blew away all those rows
This time we see “CARD=1 TBL: Sales.OrderLines”
SQL Server doesn’t like to estimate 0 for empty tables. It likes to estimate 1. It knows this table is empty.
With this information, it chooses a different plan for computation. The plan doesn’t require looking at the quantity column this time– we don’t have any lines about that at all.
But the statistics don’t look any different
You might expect to see that the statistic on Quantity had updated. I expected it, before I ran through this demo.
But SQL Server never actually had to load up the statistic on Quantity for the query above. So it didn’t bother to update the statistic. It didn’t need to, because it knows that the table is empty, and this doesn’t show up in our column or index specific statistics.
To verify, I just rerun my metadata query above, and things look the same:
What if the table has exactly one row?
Let’s insert one and find out:
INSERT INTO [Sales].[OrderLines] (OrderLineID, OrderID, StockItemID, Description, PackageTypeID, Quantity, UnitPrice, TaxRate, PickedQuantity, PickingCompletedWhen, LastEditedBy, LastEditedWhen)
VALUES (1, 45, 164, '32 mm Double sided bubble wrap 50m', 7, 50, 112.00, 15.000, 50, '2013-01-02 11:00:00.0000000', 4, '2013-01-02 11:00:00.0000000')
GO
Now we run our familiar query, with all its merry trace flags:
SELECT *
FROM Sales.OrderLines
WHERE Quantity > 10
OPTION
(
QUERYTRACEON 3604,
QUERYTRACEON 2363,
RECOMPILE
)
GO
And here’s what SQL Server has to say about optimizing that…
Begin selectivity computation
Begin selectivity computation
Input tree:
LogOp_Select
CStCollBaseTable(ID=1, CARD=1 TBL: Sales.OrderLines)
ScaOp_Comp x_cmpGt
ScaOp_Identifier QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity
ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=10)
Plan for computation:
CSelCalcColumnInInterval
Column: QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity
Loaded histogram for column QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity from stats with id 7
Selectivity: 1
Stats collection generated:
CStCollFilter(ID=2, CARD=1)
CStCollBaseTable(ID=1, CARD=1 TBL: Sales.OrderLines)
End selectivity computation
(1 row(s) affected)
One row is enough to use our column statistic
Looking at the beginning, CARD=1 for Sales.OrderLines, just like it did after we truncated the table. But SQL Server does something different this time, indicating that it now knows that the table isn’t really empty.
It goes back to the CSelCalcColumnInInterval plan to optimize. And it loads up the column stat for the Quantity column.
Since this statistic was loaded into memory, it should have auto-updated based on my database settings. Sure enough, it did:
SQL Server knows when you’ve truncated a table
And the fact that the table has been truncated may mean that it doesn’t need to use statistics on the table when optimizing queries. After all, it’s an empty table, so it can take shortcuts!
So don’t get too confused if statistics look way out of date for a truncated table. Instead, ask yourself, “why am I querying a truncated table?” (Related disclaimer: I only tested this on SQL Server 2016.)
Want to learn more about statistics in SQL Server? Start here.