Last week, I have had the wonderful opportunity to deliver a webinar for StarWind Software on providing high availability with SQL Server failover clustering and cluster shared volumes. I had a great time delivering the presentation and answering some questions regarding SQL Server high availability and how StarWind Software can help meet that goal while reducing total cost of ownership. The webinar recording is available below.
We didn’t have time to cover all of the questions that were raised during the webinar so I reached out to StarWind Software and asked if I can write a follow-up blog post answering some of the questions raised during the webinar. I also asked those who managed to attend to submit follow-up questions thru the forum so I can add them in this blog post. I only focused on SQL Server-related questions and let Max Kolomyeytsev answer questions related to StarWind Virtual SAN.
Q: What is the difference between SQL Failover Cluster and SQL Always on Failover Cluster Instance(not AlwaysOn Availability Group?)
A: To be perfectly honest, these are just marketing terms. AlwaysOn is a marketing term that includes SQL Server high availability and disaster recovery features such as Failover Clustered Instance and Availability Group. I’ve stopped using the term AlwaysOn when I realized that more and more customers and SQL Server professionals are getting more confused with the usage. I just stick with either SQL Server failover clustered instance and SQL Server Availability Group. Cluster MVP (and one of my failover clustering superheroes) Allan Hirt blogged about why this is becoming a bit of a confusion for customers.
Q: If we use this method (referring to the use of cluster shared volumes with SQL Server failover clustered instances,) can we avoid restarting the SQL Server service?
A: Cluster shared volumes were introduced in Windows Server 2008 R2 specifically for the Hyper-V role. SQL Server 2014 supported the use of cluster shared volumes for failover clustered instances. Cluster shared volumes provides a true shared storage in that all of the nodes in the cluster have concurrent access to the shared disks. The cluster can decide to route I/O workload to any of the nodes in the cluster should there be issues with the logical path to the shared storage. This increases storage resiliency and increases uptime for the SQL Server resource since it is no longer directly dependent on waiting for the shared disk to come online on the target node. The SQL Server resource is still dependent on the shared disk in a way that the database files – master, tempdb, model, msdb, user databases, etc. – are still stored on the shared disks. However, failover of the SQL Server resource is no longer dependent on the shared storage, making it quicker to failover to any of the nodes in the cluster. However, this does not mean that you no longer have to restart the SQL Server service. The clustered resource failover in itself is a restart process, stopping it on the node that is currently running it and starting it on the target node.
Q: Actually I am using SQL Server 2008, can I use StarWind to migrate to SQL Server 2014?
A: StarWind is just a storage piece of the overall SQL Server architecture. There are a lot of considerations when migrating from an older version to a newer version – hardware, operating system, storage, application connectivity, HA/DR, replication, performance, compatibility, etc. When you upgrade an older version of SQL Server to a newer version, the database files get upgraded as well. This means that the version numbers will be incremented depending on the target version. One of my main considerations when upgrading databases is the criticality of the application using it. If they are currently running on a failover cluster, I would instantly assume that high availability is a priority and so minimizing downtime during the upgrade process is the goal. I almost always recommend doing a side-by-side upgrade and provision additional servers running the target version of the operating system and SQL Server for the upgrade. That way, I can perform the preliminary steps to upgrade the databases and tests with the application without impacting production workload. When the team is ready to cutover to the new production environment, we automate all of the process to reduce downtime as much as possible.
I’m doing a webinar with DH2i on exactly this topic. If you’re interested to learn more, sign up for the webinar to know more about how to reduce, if not eliminate, the pains and challenges of upgrade and migration to a newer version of SQL Server using DxEnterprise. Check out this site for more details.
Q: Are there any clustering advantages to using Windows Server 2012 R2 over Windows Server 2012? Same for SQL Server 2014 vs SQL Server 2012.
A: Absolutely. The introduction of dynamic witness in Windows Server 2012 R2, dynamic node weight, LowerQuorumPriorityNodeID property, etc. are just some of the features that make running a SQL Server failover clustered instance on Windows Server 2012 R2 really worth it. This makes deploying and managing a failover cluster easier and maintain high availability goals. Quorum behavior is different between Windows Server 2008, Windows Server 2012 and Windows Server 2012 R2 and that affects SQL Server workloads running on top of a failover cluster. The biggest challenge that I see in the field is IT professionals unaware of these different behaviors when running either Availability Groups or Failover Clustered Instances. In my recently completed virtual class on Windows Server Failover Clustering for SQL Server DBAs, I spent a lot of time explaining these differences because I want administrators to understand what is happening under the covers and how those different behaviors affect their database availability.
As far as failover clustered instances are concerned, there really isn’t much of a difference between SQL Server 2012 and SQL Server 2014 other than support for cluster shared volumes. The big advantages I see comes from combining the different features with failover clustered instances. For example, for databases experiencing latch contention, the In-Memory OLTP feature in SQL Server 2014 running on a failover clustered instance provides both high availability and performance optimization. The Resource Governor for I/O workloads makes it easier to predict performance of a failover clustered instance should a failover happens on a multi-instance cluster having all of the SQL Server instances running on a single node. These are just some of the new features in SQL Server 2014 that, when running on a failover clustered instance, make compelling reasons to upgrade.
Q: Are there any plan by MS to make (AlwaysOn) Availability Groups available to Standard Edition?
A: I answered this question during the webinar. The SQL Server MVP community has been asking ever since this feature was introduced. As a former Windows MVP who has worked with small and medium sized businesses, I understand that the main challenge with adoption to the latest version of SQL Server is licensing cost. I would recommend making the same request to Microsoft and ask that Availability Group be made available on Standard Edition even on a limited number of replicas, say two.
Q: Which is better: SQL Server running as a failover clustered instance or SQL Server running on a virtual machine where the virtual hosts already provide high availability?
A: That really depends on the business requirement. If the database is deemed mission-critical, it makes sense to run it as a failover clustered instance – be it as a physical cluster or a guest cluster inside a virtualization platform. The physical host provides high availability for the virtual machine but it does not provide high availability for the databases. You still need to install service packs, cumulative updates, security patches, etc. which may or may not require a server reboot.
Q: How will the configuration look like in case of SQL Server failover clustered instances running in VMs – iSCSI to both machines or shared VHD will be preferred?
A: This also depends on the underlying virtualization platform. Microsoft Hyper-V and VMWare offer different options for providing the virtual hard disks to the VMs. You need to balance the cost of implementation with the complexity of the solution and the performance requirement. SQL Server requires a lot of I/O and memory resources. If you use an iSCSI target like StarWind Virtual SAN, you can combine it with the different storage options like SSDs or tiered storage to provide I/O performance for your SQL Server databases. It’s the same thing when using shared VHDX. But if you need to take snapshots of the virtual machines, a shared VHDX prevents you from doing that while iSCSI does not. I personally use shared VHDX for testing my failover clusters just because of the simplicity. I can automate creation and provisioning of Hyper-V virtual machines using shared VHDX with Windows PowerShell.
Q: On virtual machines, what would be the MAXDOP & Cost Parrellism Threshold should be if 2 CPUs?
A: I also answered this question on the webinar. The MAXDOP and cost threshold for parallelism values is not something that can be easily assigned without knowing the existing workload. The only time I make a solid recommendation for using a specific MAXDOP value for the SQL Server instance is when you are running it for SharePoint databases (I’ve written a PowerShell script that checks this configuration for SharePoint databases because it is documented and supported.)
But for any other workloads, analyze what your performance bottleneck is. Instead of just making CPU-settings at the instance- or query-level, understand what the virtualization platform is doing to your SQL Server CPU. You don’t want to be making configuration changes on your SQL Server instance just because somebody on the Internet said so. And parallelism in itself is not a problem. Review the top wait types occuring on the system and see if you are experiencing performance issues caused by parallelism. The CXPACKET wait type is typically associated with parallelism but not necessarily a problem. Maybe it could be a symptom of something else like queries that do a lot of table scans, inefficient indexes, outdated statistics, etc. There is no one-size-fits-all solution for MAXDOP and cost threshold for parallelism. Know your workload, analyze your top waits to identify if parallelism is indeed your main issue.
Q: Are the eight replicas in SQL Server 2014 Standard or Enterprise Edition?
A: I answered this question during the webinar but I misunderstood it completely. I thought the question was about the number of replicas between SQL Server 2012 and SQL Server 2014 and whether you can run it in synchronous or asynchronous replication. Since Availability Groups is an Enterprise Edition feature, Standard Edition is out of the question (you can still request the feature to be included in Standard Edition for future versions.) In SQL Server 2012, you can have four (4) secondary replicas and one (1) primary replica for a total of five (5) replicas. In SQL Server 2014, you can have up to eight (8) secondary replicas and one (1) primary replica for a total of nine (9) replicas.
Q: What is the minimum operating system requirement for deploying SQL Server Availability Groups?
A: SQL Server Availability Groups are supported and works on Windows Server 2008 and higher. At this point in time, I would strongly recommend running it on Windows Server 2012 R2 if it’s a new deployment.
Q: What if the local storage is a NAS at two sites?
A: (Max Kolomyeytsev) StarWind is not (yet) able to handle a NAS as storage but we are working in that direction. In a few months, you will be able to use a NAS as your storage to virtualize with StarWind. So you can effectively replicate from one NAS to the other.
Q: What are the requirements for round-trip delay between two StarWind HA servers?
A: (Max Kolomyeytsev) The basic requirement is to have up to 3 ms between the StarWind HA servers. If they are located in different sites or locations, StarWind supports up to 5 ms. And you can always find the latest information about the system requirements on the StarWind Software website.
These are all the questions submitted during and after the webinar. If you have additional questions related to the webinar, feel free to post them in the comments section and I will update the blog post to include them in. Again, thanks for the wonderful opportunity to be of service to the community.
Please note: I reserve the right to delete comments that are offensive or off-topic.