One of my favorite topics in SQL Server is transaction log internals. When I first got started with SQL Server, I thought all I needed to do with the transaction log was to run regular log backups to make sure they don’t grow unexpectedly. Until I realized that there was more to it than I knew. I learned as much as I can about the transaction log – how it works, its impact on both performance, availability and disaster recovery. And because I focus on high availability and disaster recovery architectures, I try my best to explain to my customers how the transaction log plays a very important role in whatever solution that they want to pursue.
When SQLSkills.com released the SQL Server Microsoft Certified Master Readiness Videos on TechNet, the very first video that I watched was the one on Log File Internals and Maintenance (WMV | MP4) It got me really curious about the transaction log behavior and how it continues to evolve throughout the years with newer versions and features being added to the database engine. As I was learning more and more about the transaction log, I realized one very important thing that majority of us SQL Server DBAs fail to recognize as we get too deep in the technology: no database is an island.
Understanding The Application That Uses Your Database
Your database will always be coupled with an application regardless of how much you want it to be. Which means that, in order to become an efficient and effective SQL Server professional, not only should you be a subject matter expert on the database platform but also very much aware of how the applications that use it work. When I was working as a data center engineer in-charge of SQL Server databases, I became the subject matter expert in the team on all of the different applications that connect to the databases that I manage. I spent a lot of time talking to the application vendors and developers – learning about application functionality, deployment strategies, application architecture, etc.
This paid off in the long run because it made things easier when trying to troubleshoot an application issue. It fostered communication and teamwork because every one on the team is not just concerned about their area of responsibility but are also aware of how any piece in the overall application architecture is potentially causing the issue (it’s also a good way to tell the stakeholders that it is probably not a database problem.) And when the clock is ticking because of your defined service level agreements (SLAs,) every minute spent on the wrong causes matters. Instead of me focusing on what I thought was a possible issue on the database, I would assist the network engineers on how the connectivity works from the middle tier to the database.
Searching For The Missing Error 9002 Message
If you do a quick Google search on SQL Server Error 9002, you will find that this error message pertains to the SQL Server transaction log file running out of free space. As a database administrator, your first instinct would be to fix this problem immediately – by taking a transaction log backup, extending the volume where your transaction log file is stored, shrinking the log file, etc. Pretty straight-forward, don’t you think?
Of course, you are assuming that (1) you are monitoring the SQL Server error log for this specific error message and get alerted when it happens or (2) you are looking at the SQL Server error log every minute waiting for it to happen (I prefer having a monitoring tool in place.) The reality is, end users might experience the effects of the transaction log file being full but would have no clue at all about what is happening. And this can get the application administrators deep into the rabbit hole because it doesn’t tell them anything about a potential database problem.
Here’s a video I recorded that shows an example of this. I used a SharePoint site collection with a specific site collection database to demonstrate how the error message displayed to the end user does not tell you anything about a full SQL Server transaction log file. And I’ve seen it happen – SharePoint administrators spending hours as they dig deeper into the SharePoint ULS logs, the Windows Event logs on the web and application servers to try figuring out the root cause. And this is not specific to SharePoint. Any application that uses a SQL Server database can experience the same thing. Which goes back to my earlier point – a good understanding of how the applications work as they connect to the database can make the troubleshooting process a whole lot easier.
Additional Resources
- SQL Server Transaction Log Architecture and Management
- SQL Server Transaction Log Management by Tony Davis and Gail Shaw
- Troubleshoot a Full Transaction Log (SQL Server Error 9002)
- Pluralsight Course: SQL Server: Logging, Recovery, and the Transaction Log
- SQL Server Error 9002
Please note: I reserve the right to delete comments that are offensive or off-topic.