Introduction
In
our existing SQL Server Always On configuration, we had a two-node setup, with each node hosted in a separate data center, and a single witness located in the secondary (DR) data center. Due to our application's design requirements,
the availability group was configured for manual failover.
However,
this architecture introduced a critical limitation: when the DR DC went offline, we lost quorum, and the Always On configuration
became unusable. As a result, we were forced to disable and delete the
availability group, and then recreate it from scratch, including
re-adding all databases—an error-prone and time-consuming process.
To
overcome this limitation and ensure high availability across both data centers,
we implemented a more resilient architecture: two file-share
witnesses—one in each data center—combined with a DNS CNAME record to
abstract the witness name. This solution provides redundancy for the quorum
configuration and allows the availability group to survive the failure of
either data center without needing to rebuild the setup.
In these two blogs, I will walk you through how we built a more resilient Always On architecture. In Part 1 (SQL Server Always On Across Two Data centers with Dual Witnesses and DNS CNAME), I’ll explain the setup of dual file share witnesses across two data centers, along with the use of a DNS CNAME to manage quorum effectively. In Part 2 (SQL Server Always On Across Two Datacenters: Manual and Automatic Failover Testing), I’ll demonstrate how we tested failover in both manual and automatic modes—simulating complete failure of either the primary or secondary data center—and show how this architecture ensures availability without needing to rebuild the configuration.
1. Prerequisite
- I have an
existing Always On Availability Group named "testag" configured
across two nodes: SQLNODE1 and SQLNODE2.
- A DNS
Server is already available on the network.
1.1. I Created two virtual machines (VMs) with the following specifications:
- Name:
WitnessSite1, WitnessSite2
- Operating
System: Windows
- RAM: 4 GB
- CPU cores:
2
2. Creating witness1 and witness2
3. Create a shared file on each node in the cluster.
3.1. Example: creating a shared folder on witness1.
Go
to the sharing tab and click advanced sharing
Click on the 'Permissions' button.
Grant
Full Control permissions, then click apply and OK
If
you recheck the properties of this folder, you should see the following.
4. Create the CName in the DNS server
To
provide flexibility and fault tolerance for the file share witness in our
Always On configuration, we use a CNAME (Canonical Name) record in DNS. In
Windows environments, a CNAME allows you to create an alias that points to
another hostname, making it easier to switch the witness between data centers
without changing SQL Server configurations.
To create CName in the DNS Server (for my test it is Windows Server2016), Go to the DNS server and open the Server Manager >> Tools >> DNS
Create
DNS records for WitnessSite1 and WitnessSite2 in the DNS forward zone.
Create the CNAME record in the DNS forward lookup zone By right-clicking on an empty area and select 'New Alias (CNAME)'.
Double-click
on the DNS server name.
Select
the primary witness which here is WitnessSite1 and click apply
Now,
the alias has been created.
Below you can see my final DNS configuration.
To test if the alias is working correctly, go to SQLNODE1 or SQLNODE2 and nslookup the alias. Using the nslookup is a suitable method for testing DNS resolution, including alias resolution.
Another
test is to attempt to access the shared folder using its alias name on SQLNODE1
or SQLNODE2.
5. Change the File Share Witness for availability group to the witness (the CName that we created in step 4)
5.1. Remove the Existing File Share Witness
Now,
remove the current file share witness before adding a new one.
NOTE:
we can temporarily remove the old witness without affecting the availability
group's online operation.
5.1.1- Using Failover Cluster Manager on SQLNODE1 remove the old witness temporarily:
Open
Failover Cluster Manager >> Right-click on your cluster name.
>>select More Actions >> Click Configure Cluster Quorum Settings. Then
Select
the quorum witness
Select
Do not configure a quorum witness (temporarily remove the old witness).
Click
Next and Finish.
5.1.2- Checking Availability Group status after removing old
witness:
This is the status of the availability group on the SQLNODE1 (SQLNODE1 status after temporarily removing the old witness).
This
is the status of the availability group on the SQLNODE2.
This
is the Availability group status in the Failover cluster after temporarily
removing the old witness.
This
is the Availability group status in the SSMS after temporarily removing the old
witness.
Therefore,
we can temporarily remove the old witness without affecting the availability
group's online operation.
6. Create the New File Share Witness using the CName created in step 4.
6.1. Using Failover Cluster Manager on SQLNODE1 to configure
\\WitnessSite1\clusterfileshared as quorum witness for availability group:
Open
Failover Cluster Manager >> Right-click on your cluster name.
>>select More Actions >> Click Configure Cluster Quorum
Settings. Then
Select
the quorum witness
In the File Share Path, enter \\witness\clusterfileshared. This path utilizes the CNAME 'witness', which resolves to the IP address WitnessSite1.
Click Next and Finish.
Now, the witness in the Failover Cluster Manager for our cluster has been configured as
\\witness\clusterfileshared.
To verify the outcome of this change, check the contents of the
following folders on SQLNODE1 or SQLNODE2:
- \\WitnessSite1\clusterfileshared
- \\WitnessSite2\clusterfileshared
- \\witness\clusterfileshared
With
this setup—two file share witnesses distributed across both data centers and a
DNS CNAME for flexibility—we’ve created a more resilient SQL Server Always On
environment that can better withstand datacenter-level failures. This approach
eliminates the need to rebuild the availability group when one site goes down,
saving time and reducing risk during outages.
In Part 2 (SQL Server Always On Across Two Datacenters: Manual and Automatic Failover Testing) of this blog series, I’ll demonstrate how we tested this configuration by simulating real failover scenarios, including the loss of the primary and secondary datacenters. I’ll cover both manual and automatic failover modes, and show how the system responds under pressure to ensure continuity and high availability.
No comments:
Post a Comment