By Kendra Little on August 9, 2023
Database migration scripts are a popular and effective way to check database code into version control.
In this post I describe the most common features of popular migration script runners for database code deployment, along with the top gotchas that folks hit when versioning their database code with migrations.
Benefits of migration scripts for Database DevOps
Migration scripts enable checking database code into version control in a human readable format. This provides a centralized, standard way to record all changes.
Migration scripts are also useful to:
- Implement Continuous Integration processes for database code, thereby automating validation of changes to the code.
- Automate change deployment in both development and production environments.
Together, these benefits bring database code into line with processes and standards for application code.
Common tools used with a migrations approach
Database migration scripts for change deployment are used by tools like Flyway and Liquibase. Some companies write custom implementations of migration script runners.
If you are evaluating building or buying one of these toolsets, this post will help you understand essential functionality and which workflows to tackle first.
Essential functionality for database migration scripts
Core functionality to consider when adopting or writing a database migration script runner are versioned migration scripts that can be run in a defined order, a way to track migration script status against a specific target database to run scripts only when needed, baseline scripts, and re-runnable migrations.
What are database “migrations” or “migration scripts”?
A database migration script changes the state of your database, most commonly by adding, modifying, or deleting objects in the database.
Migration scripts can also add or modify data in objects in the database. An example:
- We start with an empty database.
- Our first migration script creates a table for bananas and inserts rows.
CREATE TABLE dbo.bananas (
banana_id INT,
banana_name NVARCHAR(50) UNIQUE,
banana_color NVARCHAR(20),
CONSTRAINT PK_bananas
PRIMARY KEY CLUSTERED (banana_id) );
INSERT dbo.bananas
(banana_id, banana_name, banana_color)
VALUES (1, 'Cavendish', 'Yellow'),
(2, 'Plantain', 'Green'),
(3, 'Red Dacca', 'Red'),
(4, 'Manzano', 'Yellow'),
(5, 'Burro', 'Yellow');
Info: I didn’t include any error handling in this code. Some tooling helps with that automatically, some does not.
The main reason I show this code is: it’s just that simple! Migration scripts can be written in a variety of languages, depending on what the migration script runner supports, but generally these are scripts written in plain SQL that are easy for humans to review.
Migration scripts are numbered and run in order
Our second migration creates a table for banana ratings. This table uses banana_id values that are defined in dbo.bananas.
CREATE TABLE dbo.banana_ratings (
rating_id INT ,
banana_id INT,
rating_value INT,
rating_date DATETIME,
CONSTRAINT FK_BananaRating_Banana
FOREIGN KEY (banana_id)
REFERENCES dbo.bananas (banana_id)
);
The second migration script has a foreign key dependency on the first migration. They need to always run in a specific order.
Migration runners commonly provide a way to number scripts and execute them in the desired order.
Migration script status
The log tracking which migration scripts have been run with their status is often kept in a table in the target database.
The migration script runner generally reads from this table prior to a deployment, and updates the table as the deployment progresses.
Baseline script
What if you need to start with a database with existing objects?
A “baseline” migration script contains all of the definitions of objects in the database, ordered so that dependencies can be met. This is generally contained in a single migration script with a special status to differentiate it from “normal” migrations.
Migration script runners usually only deploy the baseline script to an empty target database. This allows validation of database code for continuous integration.
Re-runnable migrations
Sometimes you want a migration to run every time you deploy changes. This might ensure the target database is configured for the appropriate environment (dev, prod, etc), create objects like stored procedures, or ensure that an object or data is in a specific state.
Migration runners sometimes allow specific naming and ordering conventions for re-runnable migration scripts.
What gets tricky?
There’s a lot that can get complex when you get into the weeds. The top places where folks get confused are related to rollbacks, shared database environments, and dealing with changes in deployment order/numbering challenges.
Rollbacks
Some tools have options for Rollback scripts, but managing them is complex. In production, you usually want every migration recorded as a “roll forward” because it’s a real deployment, and you want your future builds to play out the same sequence.
Also, everyone hates writing rollbacks manually. Everyone.
Shared database environments
Mistakes or necessary changes to scripts are often identified in shared testing or pre-production environments.
If there is no automated, fast way to reset the database (including all its data), then it can be time consuming to validate that the target environment is in the desired state once code is rolled back and scripts are modified.
Numbering while integrating changes from multiple teams
Code is often not deployed to prod in the same order it is completed.
Let’s say Ravi writes migration script 42, which needs a lot of testing. Sandy releases an important hotfix in migration 43. Now Ravi’s script is out of order. Should they renumber it? What does that mean for all environments where it has already been deployed? What does this mean for future “from scratch” builds?
Enhancing a migrations approach with state comparisons and auto-generated code
Database “state” capture and comparison tools can augment migration script approaches with cool functionality. Vendors often provide this as part of their value proposition.
Those writing rollback scripts will want to use generative AI to help draft and validate changes. Both humans and AI can write bad SQL, so be wise how you use both.
The value of getting database code into version control is worth the work
Thinking about the gotchas can be daunting. There is a lot to gain even if some portions of your process remain manual, or if you implement automation in stages. A recap of the benefits of database migrations is worth it at this point.
These will give you the ability for many good things:
- Clear audit trail of development and deployment code
- Lower likelihood of losing work or accidental deployments
- Continuous Integration
- Less or very limited manual work in deployments