Close

    High Availability and Disaster Recovery Using Distributed Availability Groups (DAGs) in MS SQL Server Always On

    Publish Date: October 17, 2025
    hadr-logo

    Introduction: When Uptime Becomes a National Priority

    Mission-critical applications—such as government portals, healthcare systems, and banking platforms—must operate continuously with data integrity and zero downtime.Microsoft SQL Server’s Always On Availability Groups (AGs) combined with Distributed Availability Groups (DAGs) provide an enterprise-grade architecture for High Availability (HA) and Disaster Recovery (DR) across geographically distributed datacenters.

    The Need for HADR in Mission-Critical Systems

    Modern e-Governance initiatives demand 24×7 operational continuity and compliance with SLA-based uptime guarantees. Distributed Availability Groups ensure that essential digital services remain resilient even during hardware, software, or regional failures.

    Continuous Availability

    • Systems remain operational without manual intervention.
    • Automatic failover ensures uninterrupted citizen service delivery.

    Geographically Distributed DR

    • Replicates databases across multiple data centers (e.g., Delhi NDCSP and NIU Hyderabad).
    • Ensures business continuity and compliance with DR mandates.

    Minimized Data Loss

    • Synchronous replication → zero or near-zero data loss in primary site.
    • Asynchronous replication → safe DR replication with minimal performance impact.

    Load Balancing & Reporting

    • Secondary replicas handle read-only workloads, backups, and analytics, improving performance and scalability.

    SLA, RTO, and RPO Considerations

    Metric Description Typical Implementation Using DAGs
    SLA (Service Level Agreement) Commitment to system uptime 99.95%–99.99% uptime for mission-critical systems
    RTO (Recovery Time Objective) Max tolerable downtime before restoration Minutes; automatic failover for local AGs and manual failover for remote DR AGs
    RPO (Recovery Point Objective) Max tolerable data loss Seconds to minutes; synchronous replication ensures near-zero data loss

    Synchronous replication → Zero loss within primary site

    Asynchronous replication → Minimal lag for remote DR site

    Why Distributed Availability Groups (DAGs)?

    Distributed Availability Groups extend the Always On architecture across multiple clusters and regions, delivering unmatched flexibility and resilience.

    Key Benefits:

    • Cross-AG Replication: Connects independent AGs across regions.
    • Disaster Recovery Across Sites: Enables asynchronous replication without affecting HA performance.
    • Seamless Failover: Automatic failover within the primary site; manual across remote clusters.
    • Scalability: Supports expansion across multiple SQL instances and geographies.
    • Operational Flexibility: Simplifies upgrades, patching, and multi-site DR testing.

    Fig. 1: Distributed Availability Group (DAG) linking two independent Availability Groups across sites

    Architecture Overview

    Primary Site (Delhi)

    • Primary AG with synchronous replicas for local HA.
    • Automatic failover ensures uninterrupted service.

    Secondary Site (Hyderabad)

    • Secondary AG configured asynchronously for DR.
    • Manual failover during large-scale outages.

    Distributed Availability Group (DAG)

    • Links both AGs into a unified architecture.
    • Provides near-zero RPO and minute-level RTO.

    Example: SBM(G) Portal — DAG implemented between NDCSP Delhi and NIU Hyderabad, ensuring business continuity for critical sanitation mission data.

    Implementation Prerequisites

    Infrastructure Requirements

    • SQL Server: Enterprise Edition 2016 or later
    • OS: Windows Server 2016 or later
    • Network: Low latency (<5 ms for synchronous), high bandwidth
    • Domain Trust: Same or trusted domains
    • Firewall Ports: TCP 5022 open for AG communication

    SQL Configuration

    • Enable Always On in SQL Server Configuration Manager
    • Create and validate individual AGs before forming the DAG
    • Configure endpoints and database health checks
    • Use unique AG Listener names
    • Set backup preferences for read replicas

    Security & Authentication

    • Use certificates if cross-domain trust is absent
    • Ensure service accounts have appropriate replication permissions
    • Validate CONNECT permissions to endpoints

    Storage & Data

    • Full recovery model databases
    • Identical schema and collation across replicas
    • Sufficient disk space for log growth

    Network & DNS

    • Static IPs and resolvable DNS names across sites
    • Latency monitoring enabled for replication lag

    Testing & Validation

    • Perform local failover validation
    • Verify endpoint connectivity
    • Test backup/restore consistency

    Firewall and Domain Replication Ports

    For Active Directory and AG communication, ensure the following ports are open:

    • TCP/UDP 389 (LDAP), 636 (LDAPS), 88 (Kerberos), 53 (DNS)
    • TCP 5022 (SQL AG Endpoint), 135 (RPC), 445 (SMB)
    • Dynamic ports 49152–65535 for RPC/DCOM communications

    Key Advantages for Government Systems

    Advantage Description
    High Availability Continuous operations with local automatic failover
    Disaster Recovery Remote replicas in DAG ensure business continuity
    Data Protection Near-zero loss with synchronous and asynchronous replication
    Load Distribution Reporting and backups handled by secondary replicas
    Scalability Multi-region DAGs expand as infrastructure grows.
    Compliance Meets government SLA, RPO, and RTO standards

    Key Learnings and Way Forward

    • DAGs provide a unified, fault-tolerant backbone for national digital systems
    • NIC’s implementation demonstrates enterprise-class resilience for public services.
    • Future expansion may include hybrid cloud DAGs and AI-based monitoring for predictive failure detection.
    • Continuous DR drills and network resilience testing are essential for operational assurance.

    Conclusion

    The deployment of Distributed Availability Groups (DAGs) under SQL Server Always On marks a significant step in strengthening India’s Digital Public Infrastructure (DPI).It enables high-availability, low-latency, and fault-tolerant data systems across multiple NIC data centers — ensuring that critical citizen-facing platforms like SBM(G) remain secure, scalable, and continuously available.