1. Redo SCN vs Checkpoint SCN (hidden control mechanism)
Oracle tracks multiple SCNs internally, but two matter most here:
Absolute rule
Checkpoint SCN ≤ Redo SCN
👉 Oracle never lets DBWR advance the checkpoint SCN beyond the redo SCN.
Why this matters
- DBWR can write dirty blocks
- But only blocks whose redo SCN ≤ redo flushed SCN
- If not → LGWR is forced first
This is the real internal enforcement of WAL.
2. Commit path: why LGWR is synchronous and DBWR is not
What happens on COMMIT (timeline)
Client session
|
|-- generate COMMIT redo record
|
|-- post LGWR
|
|-- write redo buffer to redo log
|
|-- fsync / IO completion
|
|<- commit success returned
Key point:
- Commit waits only for LGWR
- Commit does NOT wait for DBWR
Why DBWR is excluded
If commit waited for DBWR:
- Random I/O
- Unpredictable latency
- Massive scalability collapse
Oracle instead guarantees:
“If redo exists, we can always reproduce the data later”
3. Dirty buffer protection (the invisible gate)
Every dirty buffer in the buffer cache carries metadata:
- Buffer SCN
- Required redo SCN
Before DBWR writes a block:
- DBWR checks required redo SCN
- If redo is not on disk:
- DBWR signals LGWR
- DBWR waits
- Only after redo flush completes:
- DBWR writes the block
This mechanism is sometimes called a:
Redo write barrier
It is completely internal — DBAs only see effects, not the barrier itself.
4. Checkpoints: who really coordinates LGWR vs DBWR
Checkpoint is not just DBWR writing blocks.
During checkpoint:
- CKPT:
- Signals LGWR to flush redo up to checkpoint SCN
- Signals DBWR to write dirty buffers up to same SCN
- LGWR always goes first
- DBWR follows
Sequence:
CKPT
├─ LGWR → redo flush
├─ DBWR → dirty buffers
└─ CKPT → datafile headers updated
Why datafile headers last:
- They record checkpoint SCN
- Must reflect redo that is already durable
5. RAC: same rule, more writers
In RAC, the rule becomes even stricter.
Extra components:
- Redo threads per instance
- LMS / GCS (global cache service)
- Shared disk
Rule still holds:
No instance can let DBWR write a block unless redo for that block is durable in that instance’s redo thread.
Additional complexity:
- Block shipping between instances
- Commit SCN must be globally consistent
- LGWR on own instance must flush before block can be shared as clean
This is why:
- Log latency kills RAC performance
- DBWR waits often hide behind global cache waits
6. What happens when LGWR is slow
Symptom 1: log file sync waits
- Users waiting on commit
- LGWR is bottleneck
Symptom 2: DBWR stalls
- Dirty buffers accumulate
- Buffer cache pressure increases
- “Free buffer waits” may appear
Why DBWR suffers
Because:
- DBWR cannot bypass LGWR
- Data blocks are hostage to redo durability
Root causes usually are:
- Slow redo disks
- Small redo logs
- Excessive commits
- Misconfigured commit batching
7. Why Oracle chose redo-first instead of data-first
Let’s compare alternatives.
Option A: Write data first ❌
- Breaks crash recovery
- Corrupt datafiles possible
- Unrecoverable inconsistencies
Option B: Write redo first ✅
- Sequential I/O
- Small writes
- Perfect recovery model
- Scales across CPU cores
This design decision is why Oracle can commit millions of TPS without writing datafiles synchronously.
8. Recovery proves LGWR’s primacy
After crash:
- Oracle mounts database
- Reads datafiles (possibly stale)
- Reads redo logs
- Applies redo → roll forward
- Applies undo → roll back
Notice:
Datafiles are not trusted
Redo logs are trusted
That trust only exists because LGWR always wrote first.
9. Short mental model (architect‑level)
- Redo = truth
- Datafiles = cache
- LGWR = durability gatekeeper
- DBWR = background janitor
10. One‑line summary (perfect for interviews)
LGWR must write before DBWR because Oracle’s crash recovery depends on redo being the authoritative source of committed changes; DBWR is explicitly prevented from writing data blocks unless corresponding redo is already durable, enforcing Write‑Ahead Logging.
No comments:
Post a Comment