Opinions expressed on this site are solely those of Kendra Little of Catalyze SQL, LLC. Content policy: Short excerpts of blog posts (3 sentences) may be republished, but longer excerpts and artwork cannot be shared without explicit permission.
By Kendra Little on May 30, 2024
What’s it like to be a Database Administrator for managed databases in Azure? Sometimes it’s a painful guessing game when a routine, core operation– restoring a database – fails with a most unhelpful error.
In this case, if the restore is run via PowerShell, following Microsoft guidance, the error message is:
Restore-AzSqlInstanceDatabase: Long running operation failed with status ‘Failed’. Additional Info: An unexpected error occured while processing the request. [sic]
Somehow the misspelling of ‘occurred’ stings a bit more. Did anyone review the PR for this code?
Surely there’s a more clear error message if you run the restore in the Azure Portal, right?
Nope, if you attempt do to the same operation in the Portal it will fail, and the message there is:
Error code: ResourceOperationFailure Message: The resource operation completed with terminal provisioning state ‘Failed’.
Ooof. So what went wrong?
I couldn’t find the answers in the SQL Server Error Log
One of the benefits of Azure SQL Managed Instance is that it behaves a lot like a ‘normal’ SQL Server Instance. You can access the SQL Server Error Log.
But even for core operations like a database restore, it doesn’t always contain the info you’d expect. In this case I couldn’t find any info regarding the restore in the log at all.
I couldn’t find the answers on the internet, but I found another customer who also lost hours to this problem
I haven’t found anything in Microsoft documentation so far explaining what can cause The Vaguest Error Ever.
I did find a blog post from Mika Sutinen in 2021, SQL Server Managed Instance and the most unhelpful error message during a database restore. Mika hit the same error message as I did – and in Mika’s case it was worse because they were restoring a large database, so the “long running” part of the error implied it might be related to the database’s size.
In Mika’s case, the database was partially contained and they needed to run some sp_configure commands to get the restore to work. (See the link above for full details.)
Talk about a needle in a haystack.
But the database I was restoring wasn’t partially contained. I was looking for a different needle.
In my case, it was a Transparent Data Encryption problem
When using Transparent Data Encryption (TDE):
Backup files for databases that have TDE enabled are also encrypted with the DEK. As a result, when you restore these backups, the certificate that protects the DEK must be available. Microsoft Docs
If you hit issues restoring a TDE database in “normal” SQL Server and this isn’t configured correctly, you’ll receive error messages that give you a good clue – something along the lines of:
Cannot find server certificate with thumbprint….
I didn’t have access to any such message when restoring my database. And, in fact, I didn’t figure it out at all– my colleague Mike figured it out and ended my suffering after a couple of hours. (Thanks, Mike!)
Better error messages do show up in some places
Azure SQL Managed Instance also has a database cloning functionality that uses an AG-like setup under the hood, called Database Copy.
This process seems to use some sort of database restore to seed the AG. If initiating the copy fails, you get a more clear error message.
This isn’t great as a workaround, though, because there are a lot of limitations on database copy, the biggest of which is:
The source or destination managed instance shouldn’t be configured with a failover group (geo-disaster recovery) setup.
If you have to choose between maintaining Disaster Recovery and troubleshooting a restore failure, most folks are going to choose keeping DR healthy every time.
Can someone please fix this?
I created an “idea” on the Azure feedback site, Database Restores in the portal and Restore-AzSqlInstanceDatabase should give a helpful error message.
Please feel free to upvote it, if you are so inclined. I’m not sure I believe that basic stuff like this should require upvotes to be prioritized, but apparently nothing else has initiated it being fixed for years now.
I do apologize if my tone is a little hopeless– to be honest, I’m still a bit salty about how lousy this user experience is for such a core functionality as database restore. I do understand that it’s hard to get technical debt prioritized in software organizations and that there are many competing priorities, but database restore is something every user relies on to help us out in our most critical scenarios. Part of having the functionality work is helping the user be successful.
I am lucky that I did not hit this issue in an emergency, but I’m sure there are some poor folks out there who have.