Single‑Region, Single‑Availability Zone (AZ) deployments are the simplest cloud architecture but also the least fault‑tolerant. They are common in early‑stage environments, cost‑constrained setups, or legacy workloads that haven’t been modernized yet.
🔍 What Is an AZ?
An Availability Zone is a physically separate data center within a cloud region (AWS, Azure, GCP).
In a Single‑AZ setup:
- All compute, storage, networking, and database components reside within one data center.
- No failover capability exists outside that AZ.
🧩 What Does “Resiliency” Look Like in a Single‑Region, Single‑AZ Setup?
✔️ You can protect against:
- Instance failures (VM crash)
- Application failures
- Software bugs
- Local disk corruption
- Process-level outages
These are typically mitigated through:
- Auto-restart, auto-healing
- Load balancing across multiple instances within the same AZ
- Database failover within the AZ (e.g., primary ↔ standby in same data center)
- Backup & restore strategies
❌ You cannot protect against:
- AZ‑wide outage
- Power loss
- Networking isolation
- Fire/flood/physical issues in the AZ
- Region outage
If the AZ goes down, the entire workload goes down.
🏗️ Typical Resiliency Best Practices in Single‑AZ
1. Redundancy Within the AZ
- Multiple compute nodes in a single AZ
- Load balancer distributing traffic
- Managed DB with synchronous replication (single-AZ failover)
2. Automated Recovery
- Auto‑scaling groups (ASG)
- Self-healing from platform
- Application crash recovery scripts
3. Data Durability
- Regular backups to cross‑AZ or multi-region storage
(even if workload is single-AZ, backups must be multi-AZ)
4. Monitoring & Alerting
- Health checks
- Log aggregation
- Metric‑driven alerting
5. Incident Runbooks
- How to restore from backup
- How to redeploy the entire stack into a new AZ (if needed)
⚠️ Key Risks You Must Communicate to Stakeholders
A Single‑AZ design has:
- No AZ fault tolerance
- No disaster recovery capability
- Higher RTO and RPO
- No protection against data center‑level disruptions
It’s usually acceptable only for:
- Dev/Test environments
- Non‑critical services
- Cost‑optimized workloads
- Legacy apps not yet modernized
But not for mission‑critical systems.
🎯 As a Database Architect: What Should You Ensure?
Minimum DB resiliency even in a Single‑AZ:
- Synchronous replica in same AZ
- Automated failover
- Continuous backups to multi‑AZ storage
- PITR (Point-in-time Recovery)
- Automated recovery workflows
- Tested restore procedures
1. Architecture Diagram (ASCII – Single Region, Single AZ)
┌──────────────────────────────────────────────┐
│ Cloud Region (e.g., AWS ap-south-1)
│──────────────────────────────────────────────│
│ │
│ Availability Zone (e.g., ap-south-1a) │
│ ───────────────────────────────────── │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Load Balancer│ --> │ App Servers │ │
│ └──────────────┘ └──────────────┘ │
│ \ / │
│ \ / │
│ ┌──────────────┐ │
│ │ Database │ │
│ │ Primary + │ │
│ │ Standby (same │ │
│ │ AZ) │ │
│ └──────────────┘ │
│ │
│ Backups → Multi‑AZ Object Storage │
└──────────────────────────────────────────────┘
The image generated above is a comparison matrix, which complements this diagram.
✅ 2. Comparison: Single‑AZ vs Multi‑AZ vs Multi‑Region
| Dimension | Single‑AZ | Multi‑AZ | Multi‑Region |
|---|---|---|---|
| Regions | 1 | 1 | 2+ |
| AZs Used | 1 | 2–3 | 2–6 |
| Fault Tolerance | None | Survives AZ outage | Survives region outage |
| Cost | Low | Moderate (2–3x) | High (4x–10x) |
| Complexity | Simple | Moderate | High |
| RTO | 2–24 hrs (restore-based) | Minutes | Seconds–Minutes |
| RPO | Minutes–Hours | Seconds | 0–Seconds |
| Risks | AZ failure | Region-level failure | Cross-region disasters |
✅ 3. RTO/RPO Matrix
| Architecture | Typical RTO | Typical RPO | Notes |
|---|---|---|---|
| Single‑AZ | 4–24 hours | 15 min – several hours | Restore from backup |
| Multi‑AZ | 1–5 minutes | 0–5 seconds | Synchronous replication |
| Multi‑Region (Active-Passive) | 5–60 minutes | < 1 minute | Asynchronous sync |
| Multi‑Region (Active-Active) | Seconds | Zero RPO | Conflict-free architectures |
✅ 4. Cloud-Specific Examples
AWS
- Compute: EC2 in Auto Scaling Group (single AZ)
- Database: RDS Single-AZ deployment
- Backup: S3 (multi-AZ), S3 Glacier (multi-region optional)
- Networking: Single AZ subnets
- Risks: AZ failure → complete outage
Azure
- Compute: VM Scale Set (single fault domain)
- Database: Azure SQL Single‑Zone
- Storage: GRS recommended for durability
- Risks: Zone outage = full downtime
GCP
- Compute: Managed Instance Group (single zone)
- Database: Cloud SQL Single‑Zone
- Storage: Multi‑regional storage optional
- Risks: Same — no protection beyond local zone
✅ 5. Database Resiliency Patterns (Per Engine)
Oracle
- Data Guard (single-AZ synchronous)
- RMAN backups → multi‑AZ storage
- Flashback + PITR
PostgreSQL
- Streaming replication (sync within AZ)
- WAL archiving to multi-region buckets
- Patroni/pg_auto_failover for node-level protection
SQL Server
- AlwaysOn Availability Groups (single-AZ)
- Log shipping → cross-region DR
- Automated failover only within AZ
MySQL
- InnoDB ReplicaSet or Group Replication
- Backups via mysqldump + GTID cross-region
- Aurora Single‑AZ considered low resiliency
✅ 6. Complete Architecture Document (Concise)
Single‑Region, Single‑AZ Resiliency Architecture
This architecture is designed for workloads that prioritize simplicity and cost efficiency over regional or AZ‑level fault tolerance.
Components
- Compute instances deployed in a single Availability Zone
- Database with synchronous intra‑AZ replica
- Load balancers within the same AZ
- Backups stored in multi‑AZ object storage
- Centralized monitoring (CloudWatch / Azure Monitor / GCP Ops)
Fault Domains
- Handles: instance crash, OS failure, application errors
- Does NOT handle: AZ failure, region failure, physical disasters
Operational Controls
- Backup policy (daily, hourly log shipping)
- Restore testing every quarter
- Health monitoring & alerting
- Deployment automation (IaC)
When to Use
- Dev/Test environments
- Non-critical internal tools
- Proof-of-concept systems
- Low-traffic legacy apps
Not Recommended For
- Customer-facing applications
- Transactional systems (finance, retail)
- High availability (99.9%+)
- Compliance-bound workloads
No comments:
Post a Comment