Tuesday, January 27, 2026

Single‑Region, Single‑AZ Resiliency — What It Really Means ?

 Single‑Region, Single‑Availability Zone (AZ) deployments are the simplest cloud architecture but also the least fault‑tolerant. They are common in early‑stage environments, cost‑constrained setups, or legacy workloads that haven’t been modernized yet.


🔍 What Is an AZ?

An Availability Zone is a physically separate data center within a cloud region (AWS, Azure, GCP).
In a Single‑AZ setup:

  • All compute, storage, networking, and database components reside within one data center.
  • No failover capability exists outside that AZ.

🧩 What Does “Resiliency” Look Like in a Single‑Region, Single‑AZ Setup?

✔️ You can protect against:

  • Instance failures (VM crash)
  • Application failures
  • Software bugs
  • Local disk corruption
  • Process-level outages

These are typically mitigated through:

  • Auto-restart, auto-healing
  • Load balancing across multiple instances within the same AZ
  • Database failover within the AZ (e.g., primary ↔ standby in same data center)
  • Backup & restore strategies

❌ You cannot protect against:

  • AZ‑wide outage
  • Power loss
  • Networking isolation
  • Fire/flood/physical issues in the AZ
  • Region outage

If the AZ goes down, the entire workload goes down.


🏗️ Typical Resiliency Best Practices in Single‑AZ

1. Redundancy Within the AZ

  • Multiple compute nodes in a single AZ
  • Load balancer distributing traffic
  • Managed DB with synchronous replication (single-AZ failover)

2. Automated Recovery

  • Auto‑scaling groups (ASG)
  • Self-healing from platform
  • Application crash recovery scripts

3. Data Durability

  • Regular backups to cross‑AZ or multi-region storage
    (even if workload is single-AZ, backups must be multi-AZ)

4. Monitoring & Alerting

  • Health checks
  • Log aggregation
  • Metric‑driven alerting

5. Incident Runbooks

  • How to restore from backup
  • How to redeploy the entire stack into a new AZ (if needed)

⚠️ Key Risks You Must Communicate to Stakeholders

A Single‑AZ design has:

  • No AZ fault tolerance
  • No disaster recovery capability
  • Higher RTO and RPO
  • No protection against data center‑level disruptions

It’s usually acceptable only for:

  • Dev/Test environments
  • Non‑critical services
  • Cost‑optimized workloads
  • Legacy apps not yet modernized

But not for mission‑critical systems.


🎯 As a Database Architect: What Should You Ensure?

Minimum DB resiliency even in a Single‑AZ:

  • Synchronous replica in same AZ
  • Automated failover
  • Continuous backups to multi‑AZ storage
  • PITR (Point-in-time Recovery)
  • Automated recovery workflows
  • Tested restore procedures



1. Architecture Diagram (ASCII – Single Region, Single AZ)

                ┌──────────────────────────────────────────────┐
                │              Cloud Region (e.g., AWS ap-south-1)            
                │──────────────────────────────────────────────│
                │                                              │
                │      Availability Zone (e.g., ap-south-1a)   │
                │      ─────────────────────────────────────    │
                │                                              │
                │   ┌──────────────┐     ┌──────────────┐      │
                │   │ Load Balancer│ --> │ App Servers   │      │
                │   └──────────────┘     └──────────────┘      │
                │                   \      /                    │
                │                    \    /                     │
                │                  ┌──────────────┐             │
                │                  │ Database      │             │
                │                  │ Primary +     │             │
                │                  │ Standby (same │             │
                │                  │ AZ)           │             │
                │                  └──────────────┘             │
                │                                              │
                │     Backups → Multi‑AZ Object Storage        │
                └──────────────────────────────────────────────┘

The image generated above is a comparison matrix, which complements this diagram.


✅ 2. Comparison: Single‑AZ vs Multi‑AZ vs Multi‑Region

DimensionSingle‑AZMulti‑AZMulti‑Region
Regions112+
AZs Used12–32–6
Fault ToleranceNoneSurvives AZ outageSurvives region outage
CostLowModerate (2–3x)High (4x–10x)
ComplexitySimpleModerateHigh
RTO2–24 hrs (restore-based)MinutesSeconds–Minutes
RPOMinutes–HoursSeconds0–Seconds
RisksAZ failureRegion-level failureCross-region disasters

✅ 3. RTO/RPO Matrix

ArchitectureTypical RTOTypical RPONotes
Single‑AZ4–24 hours15 min – several hoursRestore from backup
Multi‑AZ1–5 minutes0–5 secondsSynchronous replication
Multi‑Region (Active-Passive)5–60 minutes< 1 minuteAsynchronous sync
Multi‑Region (Active-Active)SecondsZero RPOConflict-free architectures

✅ 4. Cloud-Specific Examples

AWS

  • Compute: EC2 in Auto Scaling Group (single AZ)
  • Database: RDS Single-AZ deployment
  • Backup: S3 (multi-AZ), S3 Glacier (multi-region optional)
  • Networking: Single AZ subnets
  • Risks: AZ failure → complete outage

Azure

  • Compute: VM Scale Set (single fault domain)
  • Database: Azure SQL Single‑Zone
  • Storage: GRS recommended for durability
  • Risks: Zone outage = full downtime

GCP

  • Compute: Managed Instance Group (single zone)
  • Database: Cloud SQL Single‑Zone
  • Storage: Multi‑regional storage optional
  • Risks: Same — no protection beyond local zone

✅ 5. Database Resiliency Patterns (Per Engine)

Oracle

  • Data Guard (single-AZ synchronous)
  • RMAN backups → multi‑AZ storage
  • Flashback + PITR

PostgreSQL

  • Streaming replication (sync within AZ)
  • WAL archiving to multi-region buckets
  • Patroni/pg_auto_failover for node-level protection

SQL Server

  • AlwaysOn Availability Groups (single-AZ)
  • Log shipping → cross-region DR
  • Automated failover only within AZ

MySQL

  • InnoDB ReplicaSet or Group Replication
  • Backups via mysqldump + GTID cross-region
  • Aurora Single‑AZ considered low resiliency

✅ 6. Complete Architecture Document (Concise)

Single‑Region, Single‑AZ Resiliency Architecture

This architecture is designed for workloads that prioritize simplicity and cost efficiency over regional or AZ‑level fault tolerance.

Components

  • Compute instances deployed in a single Availability Zone
  • Database with synchronous intra‑AZ replica
  • Load balancers within the same AZ
  • Backups stored in multi‑AZ object storage
  • Centralized monitoring (CloudWatch / Azure Monitor / GCP Ops)

Fault Domains

  • Handles: instance crash, OS failure, application errors
  • Does NOT handle: AZ failure, region failure, physical disasters

Operational Controls

  • Backup policy (daily, hourly log shipping)
  • Restore testing every quarter
  • Health monitoring & alerting
  • Deployment automation (IaC)

When to Use

  • Dev/Test environments
  • Non-critical internal tools
  • Proof-of-concept systems
  • Low-traffic legacy apps

Not Recommended For

  • Customer-facing applications
  • Transactional systems (finance, retail)
  • High availability (99.9%+)
  • Compliance-bound workloads





No comments:

Post a Comment

what is RPO and RTO ?

  ✅ What is RPO (Recovery Point Objective)? RPO = How much data loss is acceptable? It defines how far back in time you must recover your d...