Tuesday, May 26, 2026

Approach to correlate AWR + iostat to deep drive and troubleshoot oracle database performance issue

✅ 1. Objective

👉 Correlate:

OS layer → iostat (disk bottleneck)
DB layer → AWR (SQL causing I/O)

👉 Goal: Identify which SQL caused the spike on dm- disks*

🧠 ✅ 2. Correlation Logic (Core Concept)

Disk spike (iostat)
      ↓
Time window
      ↓
AWR snapshot
      ↓
Top IO SQL
      ↓
Execution plan
      ↓
Root cause

⏱️ ✅ 3. STEP 1: Identify Spike Time from iostat

Example:

dm-69 → 100% util
await → 97 ms

👉 Note:

Exact time window (e.g.)

10:02 AM – 10:10 AM

📊 ✅ 4. STEP 2: Find Matching AWR Snapshot

SELECT snap_id, begin_interval_time, end_interval_time
FROM dba_hist_snapshot
ORDER BY snap_id DESC;

👉 Pick snapshots covering spike:

Snap 101 → 10:00
Snap 102 → 10:10

🔥 ✅ 5. STEP 3: Identify Top SQL by I/O

✅ Query 1: Top Disk Read SQL

executions_delta,
disk_reads_delta,
buffer_gets_delta,
elapsed_time_delta/1000000 elapsed_sec,
ROUND(disk_reads_delta/DECODE(executions_delta,0,1,executions_delta)) reads_per_exec
FROM dba_hist_sqlstat
WHERE snap_id BETWEEN :snap1 AND :snap2
ORDER BY disk_reads_delta DESC
WHERE ROWNUM <= 10;

✅ Query 2: Top Read Throughput

disk_reads_delta,
buffer_gets_delta,
rows_processed_delta,
elapsed_time_delta/1000000 elapsed_sec
FROM dba_hist_sqlstat
WHERE snap_id BETWEEN :snap1 AND :snap2
ORDER BY disk_reads_delta DESC
WHERE ROWNUM <= 10;

✅ Query 3: Full Table Scan Candidates

SELECT sql_id,
disk_reads_delta,
executions_delta,
ROUND(disk_reads_delta/DECODE(executions_delta,0,1,executions_delta)) reads_per_exec
FROM dba_hist_sqlstat
WHERE snap_id BETWEEN :snap1 AND :snap2
AND disk_reads_delta > 100000
ORDER BY disk_reads_delta DESC;

🔍 ✅ 6. STEP 4: Identify Wait Events

SELECT event, total_waits_delta, time_waited_delta/1000 time_waited_ms
FROM dba_hist_system_event
WHERE snap_id BETWEEN :snap1 AND :snap2
AND event LIKE 'db file%'
ORDER BY time_waited_ms DESC;

✅ Interpretation:

Wait Event	Meaning
db file scattered read	Full table scan 🚨
db file sequential read	Index access
direct path read	Big scans (DW)
db file parallel read	Parallel scans

📈 ✅ 7. STEP 5: Map SQL → Execution Plan

FROM dba_hist_sql_plan
WHERE sql_id = '&sql_id'
ORDER BY id;

✅ Look for:

TABLE ACCESS FULL 🚨
INDEX RANGE SCAN
FULL OUTER JOIN
Parallel operations

🔗 ✅ 8. STEP 6: Correlate with Disk Pattern

🔴 Case 1 (Your Earlier Example)

iostat:
avgrq-sz ~ 1024 KB
high rMB/s
high await

👉 AWR shows:

db file scattered read
direct path read

✅ Conclusion: Large full table scans causing disk saturation

🟢 Case 2

small avgrq-sz
high IOPS
low await

👉 AWR:

db file sequential read

✅ Conclusion: Healthy OLTP workload

🧪 ✅ 9. Advanced Correlation Query (BEST ONE)

t.sql_text,
s.executions_delta,
s.disk_reads_delta,
ROUND(s.disk_reads_delta/DECODE(s.executions_delta,0,1,s.executions_delta)) reads_per_exec,
s.elapsed_time_delta/1000000 elapsed_sec
FROM dba_hist_sqlstat s,
dba_hist_sqltext t
WHERE s.sql_id = t.sql_id
AND s.snap_id BETWEEN :snap1 AND :snap2
ORDER BY s.disk_reads_delta DESC
FETCH FIRST 10 ROWS ONLY;

🎯 ✅ 10. Root Cause Identification Matrix

iostat Pattern	AWR Finding	Root Cause
Large IO + high latency	scattered read	Full table scan
High read MB/s	direct path read	Large query
High write latency	log file sync	Commit bottleneck
High queue	many active sessions	Concurrency overload

🚨 ✅ 11. Real Example (Like Your Case)

iostat

%util = 100
await = 97 ms
avgqu-sz = 24
rMB/s = high

AWR

Event: db file scattered read
SQL_ID: abc123xyz
Disk Reads: very high
Plan: TABLE ACCESS FULL

✅ FINAL ROOT CAUSE

👉 A large SQL doing full table scan
👉 Saturating disk -> causing high latency

✅ ✅ 12. Fix Strategy

✅ SQL Level

Add indexes
Rewrite query
Avoid full scans

✅ DB Level

Increase buffer cache
Enable parallel limits

✅ Storage Level

Spread datafiles
Use faster disk tier

✅ ✅ 13. Ultimate One-Liner Workflow (Production)

iostat spike → note time
→ find AWR snapshot
→ find top IO SQL
→ check wait events
→ review execution plan
→ fix SQL

ORACLE DATABASE PROBLEM AND SOLUTIONS