When asked about best practices on running multiple SQL Server failover clustered instances (FCI) on a multi-node Windows Server Failover Cluster (WSFC,) a common response from SQL Server experts might include the following:
- appropriately allocating enough hardware resources on all of the nodes in the WSFC
- appropriately configuring maximum server memory settings per SQL Server FCI
- monitoring SQL Server FCI failover to minimize, if not avoid, having all of the instances to run in a single node
- implementing n + 1 node – where n is the number of SQL Server FCIs
I’ve made these recommendations myself to my customers. But while I totally agree with the recommendations, I don’t instinctively give them out, by default.
And my customers are surprised by what I tell them.
What Musicians Know About Designing Highly Available Systems
As the front of house sound engineer asked me to do a quick sound check, one of the staff commented about how good the piano sound was followed by a barrage of questions – were you even playing? how did you get your keyboard to sound like that? are you using any third-party sound plugins?
As a keyboard and synth player, I spend hours learning how to play songs – learning the chord progressions, the tempo, transitions, etc. But as a musician, that’s part of the expectation. But the way I play the piano or the keyboard goes beyond the technical aspects of the music.
Who will I be playing with in the band? (Team dynamics)
Who’s the bass guitar player? Does the drummer have any background in Latin jazz? Will the rhythm guitar player be OK with quickly transitioning to a different tempo? Who I play with in a band determines how I play – it’s a team effort.
Similarly, deciding whether or not to deploy a single instance or a multi-instance SQL Server FCI should be dictated by who is in the team – does the team have enough skillset to maintain a multi-instance SQL Server FCI – a network specialist, a storage specialist, an Active Directory and DNS specialist and a failover clustering specialist? If the team does have the skillset, does it have enough members to cover people going on vacations? Are they senior engineers or are they junior operations staff?
Are we playing a fixed song list? (Incident Management)
Be it for a church service or a gig, I need to know if the band will be playing a fixed set/song list or open to make changes while playing. A fixed set/song list has a fixed set of expectations – what songs to play, how long to play them, what the transitions are, etc.
When I get hired for consulting engagements, a question that I almost always ask when creating a solution is, “Do you have a standard operating procedure for incident and change management?”
When organizations have a standard operating procedure for incident and change management, the teams have certain expectations about dealing with issues. A SQL Server FCI needs to be brought back online within X number of minutes as described in the recovery objectives (RPO/RTO) and service level agreements (SLAs.) Imagine how incident management would look like for a multi-instance SQL Server FCI. If a ticket is opened for a SQL Server FCI, will the ticket be handled by junior operations staff or will it be automatically escalated to senior engineers?
How much time do I have? (Cost)
Sometimes, I get asked to step up and play the keyboard for church if the originally scheduled musician becomes unavailable. This has also happened to me in the past playing for bands as a session musician. They tell me who the members of the band are, what the songs are and when we will be playing. If I know the songs and know the band members, I can jump in and play even on very short notice. But if I don’t know the songs enough to play them by heart, it would take a lot of time for me to learn all of them. And if I don’t have enough time, I’ll figure out a way to simplify how and what I play.
Multi-instance SQL Server FCIs have cost implications. You can run multi-instance SQL Server FCIs using Standard Edition on a 2-node WSFC. Heck, you can even do it on a 4-node (or more) WSFC so long as you keep the Possible Owners of the individual SQL Server FCI to two (2.) Be creative enough to keep them all on one node and you can run on a single SQL Server per-core license for all of them. If the memory requirement of an instance goes beyond the limits of Standard Edition – 64 GB for SQL Server 2012/2014 and 128 GB for SQL Server 2016 – you have no choice to upgrade to Enterprise Edition. As the number of instances increase, so does the cost – licensing, operations, monitoring, maintenance, etc.
Time represents different cost perspectives, depending on how the bean counters interpret them on the balanced sheet – consulting and administration hours when something goes wrong, revenue loss when the mission critical database running on the SQL Server FCI is offline for a certain period of time, training cost, etc. It’s really a matter of how the organization wants to make the investment.
Team Dynamics, Incident Management and Cost
Smaller organizations usually do not have a large team, do not have a formalized incident management process and very limited budget. It’s no surprise to see single instance, two-node SQL Server FCIs in smaller organizations and multi-instance SQL Server FCIs in a multi-node WSFC in larger organizations.
So, before we SQL Server professionals get fixated on the technical justifications of whether to deploy single instance or multi-instance SQL Server FCIs, we need to ask these questions first.
I can guarantee that your boss or your customers will remember – and thank – you for it.
True, smaller organizations don’t have a formalized incident management process and has limited budget.
Agreed. Its these non-technical reasons that need to drive the decision on whether or not to implement a single-instance or a multi-instance SQL Server failover clustered instance. While most business stakeholders talk about total cost of ownership (TCO), I often refer to it as total cost of operation. Owning it is one thing, maintaining it is another. We need to strike a balance between reducing TCO (ownership) and TCO (operations).
Thanks for reading my blog, Dave.