High Availability and Disaster Recovery Using Distributed Availability Groups (DAGs) in MS SQL Server Always On
Introduction: When Uptime Becomes a National Priority
Mission-critical applications—such as government portals, healthcare systems, and banking platforms—must operate continuously with data integrity and zero downtime.Microsoft SQL Server’s Always On Availability Groups (AGs) combined with Distributed Availability Groups (DAGs) provide an enterprise-grade architecture for High Availability (HA) and Disaster Recovery (DR) across geographically distributed datacenters.
The Need for HADR in Mission-Critical Systems
Modern e-Governance initiatives demand 24×7 operational continuity and compliance with SLA-based uptime guarantees. Distributed Availability Groups ensure that essential digital services remain resilient even during hardware, software, or regional failures.
Continuous Availability
- Systems remain operational without manual intervention.
- Automatic failover ensures uninterrupted citizen service delivery.
Geographically Distributed DR
- Replicates databases across multiple data centers (e.g., Delhi NDCSP and NIU Hyderabad).
- Ensures business continuity and compliance with DR mandates.
Minimized Data Loss
- Synchronous replication → zero or near-zero data loss in primary site.
- Asynchronous replication → safe DR replication with minimal performance impact.
Load Balancing & Reporting
- Secondary replicas handle read-only workloads, backups, and analytics, improving performance and scalability.
SLA, RTO, and RPO Considerations
| Metric | Description | Typical Implementation Using DAGs |
|---|---|---|
| SLA (Service Level Agreement) | Commitment to system uptime | 99.95%–99.99% uptime for mission-critical systems |
| RTO (Recovery Time Objective) | Max tolerable downtime before restoration | Minutes; automatic failover for local AGs and manual failover for remote DR AGs |
| RPO (Recovery Point Objective) | Max tolerable data loss | Seconds to minutes; synchronous replication ensures near-zero data loss |
Synchronous replication → Zero loss within primary site
Asynchronous replication → Minimal lag for remote DR site
Why Distributed Availability Groups (DAGs)?
Distributed Availability Groups extend the Always On architecture across multiple clusters and regions, delivering unmatched flexibility and resilience.
Key Benefits:
- Cross-AG Replication: Connects independent AGs across regions.
- Disaster Recovery Across Sites: Enables asynchronous replication without affecting HA performance.
- Seamless Failover: Automatic failover within the primary site; manual across remote clusters.
- Scalability: Supports expansion across multiple SQL instances and geographies.
- Operational Flexibility: Simplifies upgrades, patching, and multi-site DR testing.

Fig. 1: Distributed Availability Group (DAG) linking two independent Availability Groups across sites
Architecture Overview
Primary Site (Delhi)
- Primary AG with synchronous replicas for local HA.
- Automatic failover ensures uninterrupted service.
Secondary Site (Hyderabad)
- Secondary AG configured asynchronously for DR.
- Manual failover during large-scale outages.
Distributed Availability Group (DAG)
- Links both AGs into a unified architecture.
- Provides near-zero RPO and minute-level RTO.
Example: SBM(G) Portal — DAG implemented between NDCSP Delhi and NIU Hyderabad, ensuring business continuity for critical sanitation mission data.
Implementation Prerequisites
Infrastructure Requirements
- SQL Server: Enterprise Edition 2016 or later
- OS: Windows Server 2016 or later
- Network: Low latency (<5 ms for synchronous), high bandwidth
- Domain Trust: Same or trusted domains
- Firewall Ports: TCP 5022 open for AG communication
SQL Configuration
- Enable Always On in SQL Server Configuration Manager
- Create and validate individual AGs before forming the DAG
- Configure endpoints and database health checks
- Use unique AG Listener names
- Set backup preferences for read replicas
Security & Authentication
- Use certificates if cross-domain trust is absent
- Ensure service accounts have appropriate replication permissions
- Validate CONNECT permissions to endpoints
Storage & Data
- Full recovery model databases
- Identical schema and collation across replicas
- Sufficient disk space for log growth
Network & DNS
- Static IPs and resolvable DNS names across sites
- Latency monitoring enabled for replication lag
Testing & Validation
- Perform local failover validation
- Verify endpoint connectivity
- Test backup/restore consistency
Firewall and Domain Replication Ports
For Active Directory and AG communication, ensure the following ports are open:
- TCP/UDP 389 (LDAP), 636 (LDAPS), 88 (Kerberos), 53 (DNS)
- TCP 5022 (SQL AG Endpoint), 135 (RPC), 445 (SMB)
- Dynamic ports 49152–65535 for RPC/DCOM communications
Key Advantages for Government Systems
| Advantage | Description |
|---|---|
| High Availability | Continuous operations with local automatic failover |
| Disaster Recovery | Remote replicas in DAG ensure business continuity |
| Data Protection | Near-zero loss with synchronous and asynchronous replication |
| Load Distribution | Reporting and backups handled by secondary replicas |
| Scalability | Multi-region DAGs expand as infrastructure grows. |
| Compliance | Meets government SLA, RPO, and RTO standards |
Key Learnings and Way Forward
- DAGs provide a unified, fault-tolerant backbone for national digital systems
- NIC’s implementation demonstrates enterprise-class resilience for public services.
- Future expansion may include hybrid cloud DAGs and AI-based monitoring for predictive failure detection.
- Continuous DR drills and network resilience testing are essential for operational assurance.
Conclusion
The deployment of Distributed Availability Groups (DAGs) under SQL Server Always On marks a significant step in strengthening India’s Digital Public Infrastructure (DPI).It enables high-availability, low-latency, and fault-tolerant data systems across multiple NIC data centers — ensuring that critical citizen-facing platforms like SBM(G) remain secure, scalable, and continuously available.