Opinions expressed on this site are solely those of Kendra Little of Catalyze SQL, LLC. Content policy: Short excerpts of blog posts (3 sentences) may be republished, but longer excerpts and artwork cannot be shared without explicit permission.
on August 11, 2016
You’d love to have a job tuning SQL Servers, but you don’t have an environment to practice in. Here’s how to teach yourself performance tuning and prepare yourself to land and succeed in job interviews.
This is a “listen-able” 20 minute video. Prefer a podcast instead? Find it at kendralittle.com/dearsqldba.
A written version of the discussion with clickable links is just under this video.
Is there a way I can gain SQL performance tuning experience if I don’t have access to a live production environment? I read lots of blogs and attend classes and conferences were I can, but I don’t feel confident.
I know real experience is the best, but I’d like to do whatever I can, and I’d like to get a job tuning performance.
You’re right to ask this question: job interviews focus heavily on experience
It’s tough to get a job without direct experience.
But there’s a bright side with performance tuning: not a lot of people have direct experience.
If you follow what I outline in this post, you’ll be able to talk about what you’ve done to learn and the problems you’ve retro-engineered and solved. That will give you a real advantage in those interviews.
You’re going to need a sample dataset to work with
There are lots of options for sample databases. Anything can work.
With any dataset, you may need to write code to enlarge tables or change the data around to demonstrate specific problems. That’s a normal part of the challenge – it’s really a feature in a way.
Here’s just a few of the sample databases out there:
- Microsoft’s AdventureWorks sample database: http://msftdbprodsamples.codeplex.com/. AdventureWorks is small. To demonstrate slow queries and speed them up, expanding the database is very helpful. Jonathan Kehayias wrote a script to enlarge it.
- Microsoft’s World Wide Importers sample database for SQL Server 2016
- The BabbyNames sample database - large or small options. I’m biased towards this one because I maintain it on GitHub.
- StackOverflow sample database, shared on BitTorrent by Brent Ozar
- SQLSkills sample databases, including their Credit database, SalesDB, and a Baseball Stats database.
- There are also many free data downloads at Data.gov, if you want to build your own!
If you have enough space to keep multiple of these databases on your instance, there’s no reason to only use one of them as a learner.
If you’re planning to take your experience and teach a class, you may want to focus on just one sample database, though – and also make sure you have the rights to share it with students. (Switching around between databases in a class can be confusing.)
Start writing queries that demonstrate TSQL anti patterns - and make them slow
You know how people say that the best way to learn something is to teach it?
The best way to learn to speed up queries is to write slow ones.
The best way to get a job speeding up queries is to write a blog about the queries you’ve sped up.
The hardest part is going to be writing slow queries properly. You wouldn’t think that it takes talent to write truly crappy TSQL, but it takes me quite a long time to write terrible queries that demonstrate an anti-pattern against a sample dataset.
Two articles will get you started on anti-patterns:
- Grant Fritchey’s Seven Sins against TSQL Performance article on Simple Talk
- Aaron Bertrand’s Bad Habits Revival on the SQL Sentry blog (not all cause performance issues, you’ll learn as you work through the list)
These articles will include sample code. Use that as inspiration.
If you really want to learn performance tuning outside of a production environment, writing your own slow code and then speeding it up is the most effective approach.
For each anti pattern you create, understand the execution plan and how to measure the query
For each slow query you write, test different solutions and compare them. To do this well, you’ll need to:
- Research operators in the execution plans when the query is slow and fast
- Learn how to measure performance using tools like STATISTICS TIME and STATISTICS IO
I find that the easiest way to do this is to make lots of notes in my TSQL scripts as I go, to remind myself of the performance at different points in the script.
Use the queries to make an anti-pattern environment
Once you have a bunch of slow queries, you can create an environment of bad queries.
One easy way to do this is to set up SQL Server Agent jobs that run the queries in a loop or on a scheduled basis.
You’ll learn quickly that you do have to meter them out in a way, because just running a ton of stuff in a tight loop is going to completely overwhelm your CPUs.
Some options for running a bunch of queries:
- Michael J Swart wrote a great post on generating concurrent activity that lists out a bunch of tools which can help.
- One note is that the SQL Query Stress tool originally written by Adam Machanic is now maintained on GitHub by Erik, Ejlskov Jensen.
Practice finding the worst queries and diagnosing a solution
Some of your bad queries are going to be worse for your instance than others.
But which are the worst of the worst?
And what’s the most efficient way to fix the top three queries with the least amount of work?
After automating your queries, you can now practice:
- using sp_WhoIsActive to find out what’s slow right now
- using wait statistics to measure bottlenecks on the instance (see the episode on my performance tuning process)
Finally, add in database and server level anti-patterns
You can take this even farther, and challenge yourself to:
- Simulate tempdb contention
- Turn on database auto-shrink, and see if you can identify from the server exactly what it slows down and by how much
- Change server settings related to parallelism and measure how it impacts performance, and how you would detect and tune those settings. (I did an episode on parallelism called Max Degree of Confusion.)
- Lower your memory settings so not all data fits in cache, and measure how that impacts performance
Remember what I said about the blog?
Blogging about this process as you go through it serves a few purposes:
- Writing about it helps you remember things
- Writing about it under your name will act as an online resume
- Bonus: you’re helping others as you go
You’re going to need to be persistent about this project to make it work. And it’s going to take a lot of time.
Blogging as you go is extra work, but if your goal is to get a job, it’s incredibly valuable - because if you do this once a week for a year, that link at the top of your resume is going to be almost as awesome as your confidence about what you’ve learned.