on February 22, 2018
I got a great question about transactions by email recently about transactions.
When should you use transactions?
From what I have read online the answer is whenever you can, but I haven’t found a good why answer.
I understand that you should be using them when you want a set of events to succeed or fail. But what if you just have a single statement that doesn’t depend on other statements - why should you use transactions?
We have experienced some errors when we used them and so I was wondering what merits does it have that make it worth the code maintenance, especially since SQL Server implicitly creates transactions.
Part of why this is extra confusing is that there are three types of transactions. First up, let’s clarify a few terms.
Explicit vs. Implicit vs. Autocommit Transactions
1. Explicit transactions
This is the type of transaction where you explicitly start the transaction with BEGIN TRAN or BEGIN TRANSACTION, and later complete the work with either a COMMIT statement or undo it with a ROLLBACK statement.
As our questioner writes, if you have multiple statements that should either succeed or fail as an individual unit, BEGIN TRAN allows you to bundle them together.
2. Implicit transactions
Implicit transactions are a bit weird, and I typically only run into them when applications have been written for a different relational database and then ported to SQL Server.
To use implicit transactions, you run the statement: SET IMPLICIT_TRANSACTIONS ON;
This puts you in a mode where running a statement which accesses a table implies that you ran BEGIN TRAN. You must explicitly COMMIT or ROLL BACK when you’re done with a unit of work, whether it’s one or more statements.
I find implicit transactions confusing, and I don’t use it.
3. Autocommit transactions
If you do not enable implicit transactions, and you don’t start an explicit transaction, you are in the default “autocommit” mode.
This mode means that individual statements are automatically committed or rolled back as whole units. You can’t end up in a place where only half your statement is committed.
Our question is really about whether there are unseen problems with this default mode of autocommit for single-statement units of work.
So, to rephrase the question…
Are there times when we should use an explicit transaction for single statements, instead of relying on autocommit?
Generally, you’re fine staying in the default autocommit mode for singleton statements that are entirely their own unit of work.
I’ve fun into a few instances where creating a larger unit of work for modifications helped performance, however! Example: are you doing a lot of tiny modifications?
Sometimes you get better performance bundling multiple statements into a transaction
Way back when I was a Junior DBA, there was a project to modify a lot of data in our OLTP database. We had a big change coming up which required a lot of new data to be present, and some data to be modified for customers.
We couldn’t have much downtime for the change, so a plan was hatched to get the data all set up in production while the system was live, then to cut over to the new feature as the actual change. This was tested out in staging, and everything was great, but slow.
The staging environment was generally slow, so that didn’t mean much by itself.
But by the time we got to deploy the change to production, it was slow as well. Luckily, we had a plan to make it faster.
- Indexes had been deployed to help make the modifications as fast as possible
- The scripts to make the data changes were set up to be able to be safely stopped at any time and restarted, in case of blocking problems or any concern about performance
- The scripts had a @BatchSize parameter, which allowed the DBAs to determine how many modification statements were run inside a single transaction
- The scripts had a @WaitDuration parameter, which allowed the DBAs to determine how many milliseconds to wait between batches
Each modification statement was a single unit of work, but the script ran faster when multiple modification statements were committed at once! @BatchSize = 1 was not awesome, it was very slow.
This is because lots of tiny little commits can put a lot of pressure on your transaction log, causing a bottleneck
It’s like wanting to read someone a long story, but making a phone call for every single word.
If the person is very busy and can’t talk to only you for a whole hour, to the exclusion of everyone else, then you want to figure out how long they can spare for a call, and how many words you can fit into a single call.
Amit Banerjee does a demo and breaks everything down in a great way in this classic post.
Explicit and autocommit transactions show up differently in some performance counters
I don’t think this is a reason to use explicit transactions, it’s just something to be aware of.
The basics are this:
- Modifications in autocommit mode and select statements in explicit transactions both show up under Transactions/sec and Batch Requests/sec
- Select statements in autocommit mode do not show up under the Transactions/sec counter, but they DO show up under Batch Requests/sec
You can watch a demo of it on my post here.
Generally speaking, I think this is a reason to use the Batch requests/sec counter over Transactions/sec (not a reason to go adding transactions to all your singleton select statements).
What about marked transactions?
The idea behind marked transactions sounds pretty cool: you can mark a transaction as a recovery point.
Let’s say I’m releasing a big change that touches multiple databases. I could use marked transactions to allow me to restore those databases all to the mark, so that I could perhaps restore them to right before the change was deployed, or right after it was complete (depending on which I used).
The reason I’m not crazy about using these “just in case,” or for very frequent use, is that there some overhead:
- They use space in your transaction logs in each database they touch
- They use space in msdb, and if you use a ton of these, you may need to do some maintenance there
- There is some complexity around using this with multiple instances (details here)
I’m not against marked transactions if they meet a real need and the team has the time to monitor their overhead. For large releases that touch multiple databases, SAN snapshots are often more convenient (although obviously they are not tied to a transaction).