Monday, April 27, 2026

Troubleshooting CPU - I/O : Best linux ps Command Arguments for Troubleshooting - One‑Command “Master View” (Highly Recommended)

 

One‑Command “Master View” (Highly Recommended)

ps -eo user,pid,ppid,stat,%cpu,%mem,etime,lstart,wchan,comm --sort=-%cpu

🔍 What it shows (and why it matters)

FieldWhy it’s important
userWho owns the process
pidProcess ID
ppidParent process (helps detect orphans)
statProcess state (R/S/D/Z/T)
%cpuCPU consumption
%memMemory usage
etimeHow long the process has been running
lstartExact start time
wchanKernel wait channel (I/O diagnosis)
commExecutable name
--sort=-%cpuTop CPU consumers first

This is your best single snapshot for general troubleshooting


2️⃣ CPU Troubleshooting (High CPU / Run Queues)

ps -eo pid,ppid,stat,psr,pri,ni,%cpu,time,comm --sort=-%cpu | head -20

Key columns

ColumnMeaning
psrWhich CPU core it’s running on
priKernel priority
niNice value
timeTotal CPU time consumed

✅ Use when:

  • Load average is high
  • CPU is saturated
  • Performance complaints

3️⃣ I/O Troubleshooting (MOST CRITICAL)

🔥 Identify blocked processes (D state)

ps -eo pid,stat,wchan,%cpu,etime,comm | awk '$2 ~ /D/'

Why this is powerful

FieldPurpose
DUninterruptible sleep (I/O wait)
wchanWhat kernel function it’s stuck on
etimeHow long it has been blocked

Common wchan values and meaning

wchanMeaning
io_scheduleDisk I/O wait
wait_on_page_bitMemory/disk interaction
nfs_waitNFS hang
blk_mq_get_tagStorage queue congestion

🚨 If Oracle or DB processes appear here → storage issue almost guaranteed


4️⃣ Memory & Leak Detection

ps -eo pid,ppid,stat,rss,vsz,%mem,comm --sort=-rss | head -20

Key fields

FieldMeaning
rssReal memory in KB
vszVirtual memory
%memRAM usage

✅ Use when:

  • System is swapping
  • OOM killer events
  • Slow performance despite low CPU

5️⃣ Full Command, Arguments & Environment

ps -eo pid,stat,%cpu,%mem,cmd --sort=-%cpu

Why this matters:

  • cmd shows complete arguments
  • Crucial for:
    • Java tuning
    • Oracle startup flags
    • Application misconfiguration

6️⃣ Zombie Process Detection

ps -eo pid,ppid,stat,etime,comm | awk '$3 ~ /Z/'

Why care?

  • Zombies indicate parent process bug
  • Can exhaust PID space
  • Need parent restart (not kill)

7️⃣ Oracle / Database‑Focused View (DBA Favorite)

ps -eo pid,stat,%cpu,%mem,etime,wchan,comm | grep ora_

✅ Detects:

  • DBWR / LGWR I/O stalls
  • Parallel worker hangs
  • Backup‑related blockages

8️⃣ Thread‑Level Analysis (Advanced CPU Debugging)

ps -eLo pid,lwp,stat,%cpu,psr,comm --sort=-%cpu

Use when:

  • Java or Oracle shows high CPU
  • Need hot thread detection
  • Correlating with perf / jstack

9️⃣ Parent‑Child Relationship Analysis

ps -eo pid,ppid,stat,etime,comm --forest

✅ Great for:

  • Detecting fork storms
  • Tracing hung parent processes
  • Understanding service trees

10️⃣ Minimal “Health Check” Command (Quick & Safe)

ps -eo pid,stat,%cpu,%mem,etime,comm --sort=-%cpu | head -15

✅ Safe for production
✅ Quick triage
✅ Covers 80% of issues


🔑 What to Focus On (Cheat Sheet)

SymptomLook at
High load%cpu, R state
Stuck systemD state, wchan
Slowness%cpu, %mem, etime
Hung DBora_* + D
Memory issuesrss, %mem
Defunct processesZ

✅ Final Recommendation (What to Remember)

If you remember only ONE command, make it this:

ps -eo user,pid,ppid,stat,%cpu,%mem,etime,wchan,comm --sort=-%cpu

This single command gives: ✅ CPU
✅ I/O
✅ Memory
✅ State
✅ Ownership
✅ Runtime
✅ Kernel wait reason

No comments:

Post a Comment

Production Server/Database/Application troubleshooting Runbook for Issue like CPU, Memory, I/o , Kernel

  0️⃣ Runbook Objectives This runbook helps you: ✅ Quickly identify CPU, I/O, memory, or process issues ✅ Correlate OS metrics with database...