In a previous blog post, I wrote about why it is important to have a methodology in place for monitoring your IT assets. In this blog post, I’ll discuss what a baseline is and why it matters.
Two years ago, I’ve switched my car insurance provider in an attempt to lower down my monthly expenses. The new car insurance provider introduced a device to the market that will monitor a driver’s behavior and will adjust policy premiums based on that. The device was attached to my car and will constantly send data to their servers to monitor different metrics.
While this blog post isn’t about Big Data technologies (although the implementation of the device was driven by Big Data technologies,) it describes a very important concept. For the first year, I had to pay a certain amount for my car insurance premium that is based on general information such as age and gender. After the first year, my car insurance premium was reduced by approximately 10%. Of course, any car insurance customer would be happy to see that their policy payments were reduced. But the decision to reduce the premium was based on several facts.
- The initial premium payment was higher because there was no other basis other than general information
- The succeeding premium payments were reduced based on the gathered information
- Future premium payments will still vary based on driving behavior
A Starting Point of Reference
A baseline is defined as a line that is a starting point for measurement purposes. As I mentioned in a previous blog post, monitoring IT assets should start with defining a methodology and identifying what matters to the business. Once those have been identified, we need to define a point of reference for those measurements. When a new IT asset is deployed, there wouldn’t be any basis for the numeric values for the metrics other than what is general information like monitoring for up/down service or connectivity. There are several reasons why you need to define a baseline
- You need to know what is normal and what isn’t. It’s easy to say that there is something wrong – be it a slow database or an over speeding driver. But unless we know what normal is, there will be no empirical basis for what is abnormal
- You need to know if there are deviations from normal. Deviations from normal is “normal,” for lack of a better word, because everything is dynamic. But significant deviations from normal – like a stored procedure that took 25 minutes to run instead of 5 or a driver going at 160 kmh/100mph instead of 100kmh/62mph – can be considered a potential problem.
- You need to know if the changes are healthy or not. Changes are inevitable and growth requires change. But not all change is healthy. If the stored procedure that takes 5 minutes now takes 25 minutes because of the increase in the amount of data process, that may mean that the business is growing and profits are increasing. But a driver going at 160 kmh/100mph instead of 100kmh/62mph may signify DUI and has to be addressed before it gets worse.
Unless we define a baseline, it would be a challenge to make data-driven decisions. As database professionals, majority of our actions should be backed by data captured and baselines defined.
An Abnormal Normal
Like seasons in life, even IT assets have different phases depending on their function. An e-commerce consumer database would be transactionally busy during holiday seasons like Christmas, Black Friday and promotional sales. Just like how my driving behavior would differ between winter (when I don’t even want to go outside) and summer (when I want to spend as much time outdoors,) what is considered normal for one may not be considered as normal for another. That is why you need to define what those seasons are in your IT assets and define what normal means for them.
An example of this for database systems is when processing reports for analysis. Daily reports may take a minute or two to process while a monthly report will take longer. So, if you are monitoring a SQL Server Agent job that processes reports, a daily normal execution time won’t be the same as a monthly normal. You shouldn’t be alarmed if a job that takes 5 minutes now takes 25 minutes if you are looking at it at the end of the month.
You. Decide. Now
I remember when I first saw this Microsoft TechNet article on Establishing a Performance Baseline. I was constantly researching about the “absolute values” of the different performance counters that I need to monitor. If you read the article, some of the recommendations include phrases like “below 50%” and “close to zero.” I kept thinking, “I need an exact value because this is what I will use for my monitoring tool.” Unfortunately, just like any recommendation, Microsoft does not know exactly what your system does and its behavior. As an IT professional, we are responsible for defining what normal is for a specific IT asset. 80% CPU utilization might be normal for your system but not for others. But just like my car insurance provider, they don’t know what my normal driving behavior is until they start monitoring. Which means monitoring to establish a baseline will not happen overnight but has to be done over a certain period of time. But that won’t happen unless you start NOW. The more data points you collect, the better your understanding of your system and the more informed you will be to perform your tasks.
It also means that I, nor Microsoft nor any monitoring tool vendor, cannot make that decision for you. We’re simply here to give guidance and help you make that decision. The rest is all up to you. The ball is now in your court.
Please note: I reserve the right to delete comments that are offensive or off-topic.