on January 24, 2011
This Post Tells You How To Corrupt a SQL Server Database with a Hex Editor in Gruesome Detail
And that’s all this post tells you. Not how to fix anything, just how to break it.
If you aren’t familiar with corruption, corruption is bad. It is no fun at all on any data, or any server, that you care about.
Where you (possibly) want to do this
You only want to do this on a test database, in a land far far away from your customers, for the purpose of practicing dealing with corruption.
When things go badly, you want to be prepared. This post gives you the tools in a simple, step by step fashion, to create different types of corruption so that you can practice resolving them.
Big Disclaimer: Do not run this in production. Or anywhere near production, or anything important. Ever. Only use this at home, in a dark room, alone, when not connected to your workplace, or anything you’ve ever cared about. If you corrupt the wrong pages in a user database, you may not be able to bring it back online. If you corrupt a system database, you may be reinstalling SQL Server.
References, and thanks to Paul Randal
Everything I’m doing here I learned from Paul Randal’s blog posts. It just took me a little bit to understand how to use the hex editor and make sure I was doing it properly, so I thought I’d put down the steps I used here in detail. If you’d like to go straight to the source:
- Common bad advice around disaster recovery ? Search for “the best way to create a test corrupt database”
- Inside the Storage Engine: Using DBCC PAGE and DBCC IND to find out if page splits ever roll back ? I use this as my reference for output from DBCC IND and DBCC PAGE
- Creating, detaching, re-attaching, and fixing a suspect database ? This shows how to use the hex editor to corrupt the log file. That’s different than what I’m doing in this post, but you should test that also. This post is where I got the recommendation to use the XVI32 Hex editor.
- You can also download Sample corrupt databases to play with from SQLSkills if you prefer not to run with scissors.
Steps to corrupt your database
First, get your hex editor
Download XVI32 by Christian Maas. No installer is necessary: download the zip file, then unzip all files to a directory and run XVI32.exe
Create a database to corrupt
For our adventure, our database is named CorruptMe. We’ll create a single table, insert some data, and create a clustered index and nonclustered index on it.
(Note: Data generation technique found on Stack Overflow, attributed to Itzik Ben-Gan.)
USE master;
IF db_id('CorruptMe') IS NOT NULL
BEGIN
ALTER DATABASE CorruptMe SET SINGLE_USER WITH ROLLBACK IMMEDIATE
DROP DATABASE CorruptMe
END
CREATE DATABASE CorruptMe;
GO
--Make sure we're using CHECKSUM as our page verify option
--I'll talk about other settings in a later post.
ALTER DATABASE CorruptMe SET PAGE_VERIFY CHECKSUM;
USE CorruptMe;
--Insert some dead birdies
CREATE TABLE dbo.DeadBirdies (
birdId INT NOT NULL ,
birdName NVARCHAR(256) NOT NULL,
rowCreatedDate DATETIME2(0) NOT NULL )
;WITH
Pass0 AS (SELECT 1 AS C UNION ALL SELECT 1),
Pass1 AS (SELECT 1 AS C FROM Pass0 AS A, Pass0 AS B),
Pass2 AS (SELECT 1 AS C FROM Pass1 AS A, Pass1 AS B),
Pass3 AS (SELECT 1 AS C FROM Pass2 AS A, Pass2 AS B),
Pass4 AS (SELECT 1 AS C FROM Pass3 AS A, Pass3 AS B),
Pass5 AS (SELECT 1 AS C FROM Pass4 AS A, Pass4 AS B),
Tally AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS NUMBER FROM Pass5)
INSERT dbo.DeadBirdies (birdId, birdName, rowCreatedDate)
SELECT NUMBER AS birdId ,
'Tweetie' AS birdName ,
DATEADD(mi, NUMBER, '2000-01-01')
FROM Tally
WHERE NUMBER <= 500000
--Cluster on BirdId.
CREATE UNIQUE CLUSTERED INDEX cxBirdsBirdId ON dbo.DeadBirdies(BirdId)
--Create a nonclustered index on BirdName
CREATE NONCLUSTERED INDEX ncBirds ON dbo.DeadBirdies(BirdName)
GO
Find the pages for the nonclustered index with DBCC IND
Now we can take a look at the pages our table and nonclustered index got created on. I wanted to specifically corrupt a page in the nonclustered index on the DeadBirdies table. Of course if you wanted the clustered index, you could use index Id 1.
DBCC IND ('CorruptMe', 'DeadBirdies', 2)
I want to pick a data page for this nonclustered index, so I pick a PagePID where PageType=2. (The reference I use for DBCC IND is here.)
I pick PagePID 2784.
Note: If you’re following along, you may get a different PagePID if you use a different default fill factor.
Optional: Check out the page with DBCC PAGE
If you’d like to take a look at the page you’re about to corrupt, you can do so with the following command.
--Turn on a trace flag to have the output of DBCC PAGE return in management studio
--Otherwise it goes to the error log
DBCC TRACEON (3604);
GO
DBCC PAGE('CorruptMe', 1,2784,3);
Set the database offline
You must take your victim database offline to render it fully helpless accessible to your hex editor.
USE master; ALTER DATABASE CorruptMe SET OFFLINE;
Also, get the name of your physical data file which you’ll open in your hex editor. Copy this to your clipboard.
SELECT physical_name FROM sys.master_files WHERE name='CorruptMe';
Figure out the starting offset of the page you want to corrupt
You do this simply by multiplying the page ID (PagePid) by 8192 (the number of bytes on a page).
SELECT 2784*8192 AS [My Offset]
It’s the Moment We’ve Been Waiting For: Trash That Page
Fire up your hex editor: run XVI32.exe.
Depending on your operating system, you may want to run this with elevated privileges / right click and “run as administrator”.
Open the database file by using File ? Open, and then the data file name you copied to the clipboard. (If you didn’t set the database offline, you’ll get an error that it’s in use. If you got an error that you don’t have permissions to view the file, make sure you do have permissions and that you ran XVI32.exe with elevated privileges.)
Go to the page you want to corrupt by using Address ? GoTo (or Ctrl + G), then paste in your Offset Value. You want to search for this as a decimal.
XVI43.exe will take to right to the beginning of that page.
You can see the ASCII representation of the data in the right pane. For our example, you should be able to see the word ‘Tweetie’ represented.
I like to put the cursor in the right pane at the beginning of the word ‘Tweetie’. XVI32.exe will automatically move the cursor in the left pane, to the appropriate location.
You can corrupt the data by editing in the right pane or left pane.
For my example, I am replacing the ASCII ‘T’ in the first occurrence of the word ‘Tweetie’ with an ‘S’. You can edit more, but a little tiny corruption goes a long way.
Save the file, and you’re done!
Admire Your Own Corruption
First, bring your database back online. If you correctly edited pages in the data, this should work just fine.
Note: If you corrupted critical system tables early in the database, this may not work! If so, go back to the steps above to identify a good page offset.
ALTER DATABASE CorruptMe SET ONLINE;
You can see the corruption in a couple of different ways. If you have checksums enabled on the database, you can see the corruption by reading the page with the data on it.
Since I corrupted a page in a nonclustered index in my example, I need to make sure I use that index. So I can see it with this query:
Use CorruptMe;
GO
SELECT birdName FROM dbo.deadBirdies;
GO
That returns this big scary error, which confirms I did indeed corrupt page 2784:
Msg 824, Level 24, State 2, Line 1 SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0xb633a8e1; actual: 0xaeb39361). It occurred during a read of page (1:2784) in database ID 18 at offset 0x000000015c0000 in file ‘D:\BlahBlahBlah\CorruptMe.mdf’. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
You can also see this by running a CHECKDB or CHECKTABLE command.
DBCC CHECKDB('CorruptMe')
GO
An excerpt from its output:
Msg 8928, Level 16, State 1, Line 1
Object ID 2105058535, index ID 2, partition ID 72057594038910976, alloc unit ID 72057594039828480 (type In-row data): Page (1:2784) could not be processed. See other errors for details.
Msg 8939, Level 16, State 98, Line 1
Table error: Object ID 2105058535, index ID 2, partition ID 72057594038910976, alloc unit ID 72057594039828480 (type In-row data), page (1:2784). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 12716041 and -4.
Now Sit Back and Laugh Maniacally. And Then Fix It.
So, the whole point of this was probably to test something.
So take a moment to enjoy the fact that FOR ONCE you don’t have to panic when you see these errors, because it’s all part of your master plan.
Then go out and fix the corruption, and run your tests.