Monday, April 27, 2026

Step by step troubleshoot performance issue Linux , Oracle - CPU , Memory, I/O

 

🔍 Correlation of ps, top, iostat, and vmstat

Mental Model (Very Important)

ToolAnswers the Question
psWhich exact process is responsible?
topIs the problem CPU or memory pressure right now?
iostatIs storage slow or saturated?
vmstatIs the kernel under memory / run‑queue / I/O stress?

👉 Never use only one tool.
Real root cause comes from correlating outputs.


1️⃣ top — Real‑Time CPU & Memory Pressure

Best usage

top

(or top -o %CPU on newer systems)

What to focus on (top header)

%Cpu(s): 85.2 us, 10.1 sy,  0.0 ni,  2.0 id,  2.5 wa
FieldMeaning
usUser CPU (app / DB code)
syKernel CPU
idIdle CPU
waI/O wait (very important)

Interpretation

  • ✅ High us → application or SQL CPU
  • ✅ High sy → kernel, system calls, networking
  • 🚨 High wastorage problem, not CPU

Process section (top half)

PID   USER  %CPU  %MEM  COMMAND
2456  oracle 180.2 12.3 ora_dbw0

Now you jump to ps.


2️⃣ ps — Identify the Exact Culprit

Correlate with:

ps -eo pid,ppid,stat,%cpu,%mem,etime,wchan,comm | sort -k5 -nr | head

Key correlation

top showsps confirms
High CPU PID%cpu, etime
Hung processSTAT = D
Storage waitwchan = io_schedule

🚨 Example:

2456  D  io_schedule  ora_dbw0

👉 This tells you:
DBWR is blocked on disk I/O

Now you must check storage.


3️⃣ iostat — Storage Bottleneck Detection

Best command

iostat -xz 1 5

Critical columns

ColumnMeaning
%utilDisk busy time
awaitAvg I/O latency (ms)
svctmDisk service time
r/s w/sRead/write rate

Interpretation Rules (Golden)

SymptomMeaning
%util > 80%Disk saturated
await > 20 msStorage slow
await >> svctmQueueing problem
High writes + DBWR stuckRedo / data disk issue

🚨 Example:

sda  %util=99.8  await=120ms

✅ Confirms ps + topstorage root cause


4️⃣ vmstat — Kernel Stress & Memory I/O

Best command

vmstat 1 5

Key columns

r  b   swpd   free   buff  cache  si so   bi bo   in cs us sy id wa

Important fields explained

ColumnMeaning
rRun queue (CPU demand)
bBlocked processes (I/O)
si/soSwap in/out
bi/boBlock I/O
waI/O wait (kernel view)

Correlation logic

vmstat showsCombined meaning
b > 0Processes stuck in I/O
wa highCPU waiting for disk
r > CPU coresCPU contention
si/so > 0Memory pressure

🚨 Example:

r=1 b=6 wa=40

👉 Matches:

  • ps → many D
  • top → high I/O wait
  • iostat → high disk latency

🎯 Root cause confirmed: storage


5️⃣ End‑to‑End Correlation Scenarios


✅ Scenario A: High Load Average

Observations

  • uptime → load = 20
  • top → CPU idle
  • vmstatb=10, wa=35
  • ps → many D state
  • iostat → high await

Conclusion
Load is from I/O wait, not CPU
👉 Storage team issue


✅ Scenario B: CPU Spike

Observations

  • top%us=90
  • vmstatr > CPU cores
  • ps → process in R state
  • iostat → normal

Conclusion
Pure CPU problem
👉 Tune SQL / app / threads


✅ Scenario C: Hung Oracle Instance

Observations

  • psora_dbw0, ora_lgwr in D
  • vmstatb > 5
  • iostat → redo disk latency
  • top → high wa

Conclusion
Redo or data disk I/O stall
👉 SAN / ASM / NFS issue


6️⃣ Golden Troubleshooting Workflow (Memorize This)


symptom →
top →
ps →
vmstat →
iostat →
root cause

One‑liner sequence

top
ps -eo pid,stat,%cpu,wchan,comm | grep D
vmstat 1 5
iostat -xz 1 5


✅ Final Cheat Sheet

ToolBest for
topLive CPU/memory
psExact process & state
vmstatKernel & wait queues
iostatDisk latency & saturation

🎯 Never trust a single tool
Real diagnosis = correlation

No comments:

Post a Comment

Production Server/Database/Application troubleshooting Runbook for Issue like CPU, Memory, I/o , Kernel

  0️⃣ Runbook Objectives This runbook helps you: ✅ Quickly identify CPU, I/O, memory, or process issues ✅ Correlate OS metrics with database...