🔍 Correlation of ps, top, iostat, and vmstat
Mental Model (Very Important)
| Tool | Answers the Question |
|---|---|
| ps | Which exact process is responsible? |
| top | Is the problem CPU or memory pressure right now? |
| iostat | Is storage slow or saturated? |
| vmstat | Is the kernel under memory / run‑queue / I/O stress? |
👉 Never use only one tool.
Real root cause comes from correlating outputs.
1️⃣ top — Real‑Time CPU & Memory Pressure
Best usage
(or top -o %CPU on newer systems)
What to focus on (top header)
%Cpu(s): 85.2 us, 10.1 sy, 0.0 ni, 2.0 id, 2.5 wa
| Field | Meaning |
|---|---|
us | User CPU (app / DB code) |
sy | Kernel CPU |
id | Idle CPU |
wa | I/O wait (very important) |
Interpretation
- ✅ High
us→ application or SQL CPU - ✅ High
sy→ kernel, system calls, networking - 🚨 High
wa→ storage problem, not CPU
Process section (top half)
PID USER %CPU %MEM COMMAND
2456 oracle 180.2 12.3 ora_dbw0
Now you jump to ps.
2️⃣ ps — Identify the Exact Culprit
Correlate with:
Key correlation
top shows | ps confirms |
|---|---|
| High CPU PID | %cpu, etime |
| Hung process | STAT = D |
| Storage wait | wchan = io_schedule |
🚨 Example:
2456 D io_schedule ora_dbw0
👉 This tells you:
DBWR is blocked on disk I/O
Now you must check storage.
3️⃣ iostat — Storage Bottleneck Detection
Best command
Critical columns
| Column | Meaning |
|---|---|
%util | Disk busy time |
await | Avg I/O latency (ms) |
svctm | Disk service time |
r/s w/s | Read/write rate |
Interpretation Rules (Golden)
| Symptom | Meaning |
|---|---|
%util > 80% | Disk saturated |
await > 20 ms | Storage slow |
await >> svctm | Queueing problem |
| High writes + DBWR stuck | Redo / data disk issue |
🚨 Example:
sda %util=99.8 await=120ms
✅ Confirms ps + top → storage root cause
4️⃣ vmstat — Kernel Stress & Memory I/O
Best command
Key columns
r b swpd free buff cache si so bi bo in cs us sy id wa
Important fields explained
| Column | Meaning |
|---|---|
r | Run queue (CPU demand) |
b | Blocked processes (I/O) |
si/so | Swap in/out |
bi/bo | Block I/O |
wa | I/O wait (kernel view) |
Correlation logic
| vmstat shows | Combined meaning |
|---|---|
b > 0 | Processes stuck in I/O |
wa high | CPU waiting for disk |
r > CPU cores | CPU contention |
si/so > 0 | Memory pressure |
🚨 Example:
r=1 b=6 wa=40
👉 Matches:
ps→ manyDtop→ high I/O waitiostat→ high disk latency
🎯 Root cause confirmed: storage
5️⃣ End‑to‑End Correlation Scenarios
✅ Scenario A: High Load Average
Observations
uptime→ load = 20top→ CPU idlevmstat→b=10,wa=35ps→ manyDstateiostat→ highawait
✅ Conclusion
Load is from I/O wait, not CPU
👉 Storage team issue
✅ Scenario B: CPU Spike
Observations
top→%us=90vmstat→r > CPU coresps→ process inRstateiostat→ normal
✅ Conclusion
Pure CPU problem
👉 Tune SQL / app / threads
✅ Scenario C: Hung Oracle Instance
Observations
ps→ora_dbw0,ora_lgwrinDvmstat→b > 5iostat→ redo disk latencytop→ highwa
✅ Conclusion
Redo or data disk I/O stall
👉 SAN / ASM / NFS issue
6️⃣ Golden Troubleshooting Workflow (Memorize This)
One‑liner sequence
✅ Final Cheat Sheet
| Tool | Best for |
|---|---|
top | Live CPU/memory |
ps | Exact process & state |
vmstat | Kernel & wait queues |
iostat | Disk latency & saturation |
🎯 Never trust a single tool
Real diagnosis = correlation
No comments:
Post a Comment