Wednesday, June 3, 2026

Setup and design database architecture to meet SLA - 99.9 , 99.99 , 99.999

Achieving distinct Service Level Agreements (SLAs) for a database requires scaling redundancy and infrastructure complexity. 

The three-nine (99.9%), four-nine (99.99%), and five-nine (99.999%) tiers correspond to maximum annual downtimes of 8.76 hours, 52.6 minutes, and 5.26 minutes, respectively. 

1. Three-Nine (99.9%) SLA Architecture
  • Max Downtime: 8.76 hours/year
  • Architecture Style: Single-Region, Active-Passive 
  • Design:
    • Deploy a Primary Database instance in a single Availability Zone (AZ) for read/write traffic.
    • Implement asynchronous replication to a single Read-Replica or Failover node in a different AZ within the same region.
    • Utilize an automated health-check and DNS failover (e.g., using Amazon Route 53) to re-point traffic if the primary node goes offline. 
2. Four-Nine (99.99%) SLA Architecture
  • Max Downtime: 52.6 minutes/year 
  • Architecture Style: Single-Region, Multi-AZ Active-Active 
  • Design:
    • Utilize a cluster-based database (such as Amazon Aurora, Google Cloud Spanner, or a Galera cluster for MySQL) that operates across three separate Availability Zones.
    • Write operations require a quorum (e.g., writing to at least 2 out of 3 nodes before confirming a commit) to ensure zero data loss (RPO = 0).
    • Includes connection pooling, an automated fast failover mechanism, and instant read-scaling. 
3. Five-Nine (99.999%) SLA Architecture
  • Max Downtime: 5.26 minutes/year 
  • Architecture Style: Multi-Region, Active-Active / Global Database 
  • Design:
    • Deploy fully distributed clusters spanning multiple geographic regions to protect against a large-scale regional outage.
    • Synchronous or low-latency asynchronous replication across regions using advanced Conflict-Free Replicated Data Types (CRDTs) or distributed consensus protocols (Raft/Paxos).
    • Fronted by intelligent Global Traffic Managers to route users to the nearest healthy database node instantly. 

Summary of Requirements:



SLA TargetAllowed DowntimeInfrastructure RequirementPrimary Cost Driver
99.9%8.76 hours / yearSingle Node + 1 Standby (Single/Multi-AZ)Base infrastructure & compute
99.99%52.6 minutes / year3+ Node Multi-AZ Cluster (Active)High-availability licensing, compute
99.999%5.26 minutes / yearFully distributed Geo-replicated global meshCross-region data transfer & multi-region clusters

No comments:

Post a Comment

Setup and design database architecture to meet SLA - 99.9 , 99.99 , 99.999

Achieving distinct Service Level Agreements (SLAs) for a database requires scaling redundancy and infrastructure complexity.  The three-nine...