Monday, April 27, 2026

Step by step troubleshoot performance issue Linux , Oracle - CPU , Memory, I/O

 

🔍 Correlation of ps, top, iostat, and vmstat

Mental Model (Very Important)

ToolAnswers the Question
psWhich exact process is responsible?
topIs the problem CPU or memory pressure right now?
iostatIs storage slow or saturated?
vmstatIs the kernel under memory / run‑queue / I/O stress?

👉 Never use only one tool.
Real root cause comes from correlating outputs.


1️⃣ top — Real‑Time CPU & Memory Pressure

Best usage

top

(or top -o %CPU on newer systems)

What to focus on (top header)

%Cpu(s): 85.2 us, 10.1 sy,  0.0 ni,  2.0 id,  2.5 wa
FieldMeaning
usUser CPU (app / DB code)
syKernel CPU
idIdle CPU
waI/O wait (very important)

Interpretation

  • ✅ High us → application or SQL CPU
  • ✅ High sy → kernel, system calls, networking
  • 🚨 High wastorage problem, not CPU

Process section (top half)

PID   USER  %CPU  %MEM  COMMAND
2456  oracle 180.2 12.3 ora_dbw0

Now you jump to ps.


2️⃣ ps — Identify the Exact Culprit

Correlate with:

ps -eo pid,ppid,stat,%cpu,%mem,etime,wchan,comm | sort -k5 -nr | head

Key correlation

top showsps confirms
High CPU PID%cpu, etime
Hung processSTAT = D
Storage waitwchan = io_schedule

🚨 Example:

2456  D  io_schedule  ora_dbw0

👉 This tells you:
DBWR is blocked on disk I/O

Now you must check storage.


3️⃣ iostat — Storage Bottleneck Detection

Best command

iostat -xz 1 5

Critical columns

ColumnMeaning
%utilDisk busy time
awaitAvg I/O latency (ms)
svctmDisk service time
r/s w/sRead/write rate

Interpretation Rules (Golden)

SymptomMeaning
%util > 80%Disk saturated
await > 20 msStorage slow
await >> svctmQueueing problem
High writes + DBWR stuckRedo / data disk issue

🚨 Example:

sda  %util=99.8  await=120ms

✅ Confirms ps + topstorage root cause


4️⃣ vmstat — Kernel Stress & Memory I/O

Best command

vmstat 1 5

Key columns

r  b   swpd   free   buff  cache  si so   bi bo   in cs us sy id wa

Important fields explained

ColumnMeaning
rRun queue (CPU demand)
bBlocked processes (I/O)
si/soSwap in/out
bi/boBlock I/O
waI/O wait (kernel view)

Correlation logic

vmstat showsCombined meaning
b > 0Processes stuck in I/O
wa highCPU waiting for disk
r > CPU coresCPU contention
si/so > 0Memory pressure

🚨 Example:

r=1 b=6 wa=40

👉 Matches:

  • ps → many D
  • top → high I/O wait
  • iostat → high disk latency

🎯 Root cause confirmed: storage


5️⃣ End‑to‑End Correlation Scenarios


✅ Scenario A: High Load Average

Observations

  • uptime → load = 20
  • top → CPU idle
  • vmstatb=10, wa=35
  • ps → many D state
  • iostat → high await

Conclusion
Load is from I/O wait, not CPU
👉 Storage team issue


✅ Scenario B: CPU Spike

Observations

  • top%us=90
  • vmstatr > CPU cores
  • ps → process in R state
  • iostat → normal

Conclusion
Pure CPU problem
👉 Tune SQL / app / threads


✅ Scenario C: Hung Oracle Instance

Observations

  • psora_dbw0, ora_lgwr in D
  • vmstatb > 5
  • iostat → redo disk latency
  • top → high wa

Conclusion
Redo or data disk I/O stall
👉 SAN / ASM / NFS issue


6️⃣ Golden Troubleshooting Workflow (Memorize This)


symptom →
top →
ps →
vmstat →
iostat →
root cause

One‑liner sequence

top
ps -eo pid,stat,%cpu,wchan,comm | grep D
vmstat 1 5
iostat -xz 1 5


✅ Final Cheat Sheet

ToolBest for
topLive CPU/memory
psExact process & state
vmstatKernel & wait queues
iostatDisk latency & saturation

🎯 Never trust a single tool
Real diagnosis = correlation

Troubleshooting CPU - I/O : Best linux ps Command Arguments for Troubleshooting - One‑Command “Master View” (Highly Recommended)

 

One‑Command “Master View” (Highly Recommended)

ps -eo user,pid,ppid,stat,%cpu,%mem,etime,lstart,wchan,comm --sort=-%cpu

🔍 What it shows (and why it matters)

FieldWhy it’s important
userWho owns the process
pidProcess ID
ppidParent process (helps detect orphans)
statProcess state (R/S/D/Z/T)
%cpuCPU consumption
%memMemory usage
etimeHow long the process has been running
lstartExact start time
wchanKernel wait channel (I/O diagnosis)
commExecutable name
--sort=-%cpuTop CPU consumers first

This is your best single snapshot for general troubleshooting


2️⃣ CPU Troubleshooting (High CPU / Run Queues)

ps -eo pid,ppid,stat,psr,pri,ni,%cpu,time,comm --sort=-%cpu | head -20

Key columns

ColumnMeaning
psrWhich CPU core it’s running on
priKernel priority
niNice value
timeTotal CPU time consumed

✅ Use when:

  • Load average is high
  • CPU is saturated
  • Performance complaints

3️⃣ I/O Troubleshooting (MOST CRITICAL)

🔥 Identify blocked processes (D state)

ps -eo pid,stat,wchan,%cpu,etime,comm | awk '$2 ~ /D/'

Why this is powerful

FieldPurpose
DUninterruptible sleep (I/O wait)
wchanWhat kernel function it’s stuck on
etimeHow long it has been blocked

Common wchan values and meaning

wchanMeaning
io_scheduleDisk I/O wait
wait_on_page_bitMemory/disk interaction
nfs_waitNFS hang
blk_mq_get_tagStorage queue congestion

🚨 If Oracle or DB processes appear here → storage issue almost guaranteed


4️⃣ Memory & Leak Detection

ps -eo pid,ppid,stat,rss,vsz,%mem,comm --sort=-rss | head -20

Key fields

FieldMeaning
rssReal memory in KB
vszVirtual memory
%memRAM usage

✅ Use when:

  • System is swapping
  • OOM killer events
  • Slow performance despite low CPU

5️⃣ Full Command, Arguments & Environment

ps -eo pid,stat,%cpu,%mem,cmd --sort=-%cpu

Why this matters:

  • cmd shows complete arguments
  • Crucial for:
    • Java tuning
    • Oracle startup flags
    • Application misconfiguration

6️⃣ Zombie Process Detection

ps -eo pid,ppid,stat,etime,comm | awk '$3 ~ /Z/'

Why care?

  • Zombies indicate parent process bug
  • Can exhaust PID space
  • Need parent restart (not kill)

7️⃣ Oracle / Database‑Focused View (DBA Favorite)

ps -eo pid,stat,%cpu,%mem,etime,wchan,comm | grep ora_

✅ Detects:

  • DBWR / LGWR I/O stalls
  • Parallel worker hangs
  • Backup‑related blockages

8️⃣ Thread‑Level Analysis (Advanced CPU Debugging)

ps -eLo pid,lwp,stat,%cpu,psr,comm --sort=-%cpu

Use when:

  • Java or Oracle shows high CPU
  • Need hot thread detection
  • Correlating with perf / jstack

9️⃣ Parent‑Child Relationship Analysis

ps -eo pid,ppid,stat,etime,comm --forest

✅ Great for:

  • Detecting fork storms
  • Tracing hung parent processes
  • Understanding service trees

10️⃣ Minimal “Health Check” Command (Quick & Safe)

ps -eo pid,stat,%cpu,%mem,etime,comm --sort=-%cpu | head -15

✅ Safe for production
✅ Quick triage
✅ Covers 80% of issues


🔑 What to Focus On (Cheat Sheet)

SymptomLook at
High load%cpu, R state
Stuck systemD state, wchan
Slowness%cpu, %mem, etime
Hung DBora_* + D
Memory issuesrss, %mem
Defunct processesZ

✅ Final Recommendation (What to Remember)

If you remember only ONE command, make it this:

ps -eo user,pid,ppid,stat,%cpu,%mem,etime,wchan,comm --sort=-%cpu

This single command gives: ✅ CPU
✅ I/O
✅ Memory
✅ State
✅ Ownership
✅ Runtime
✅ Kernel wait reason

Troubleshoot storage I/O performance issue -- Linux , Oracle


Explain end‑to‑end explanation of the command :

    ps -eo pid,stat,comm | grep D


This is a process inspection command used heavily by Linux, Unix, and database administrators for system and performance troubleshooting.


1️⃣ What is ps?

ps stands for Process Status.
It reports information about currently running processes on a Linux system.

Think of it as a snapshot of processes at the moment you run the command.

📌 Unlike top or htop, ps:

  • Is not interactive
  • Shows a point‑in‑time view
  • Is ideal for scripting and diagnostics

2️⃣ Command Breakdown

ps -eo pid,stat,comm

Let’s split it into parts:


🔹 ps

Invokes the process status utility.


🔹 -e option (select processes)

-e

Means:
Show all processes running on the system

Without -e, ps would only show processes tied to the current terminal (TTY).

Equivalent options:

ps -e
ps -A

All mean “every process”.


🔹 -o option (custom output format)

-o pid,stat,comm

Means:
Choose which columns to display

Instead of default columns, you explicitly request:

FieldMeaning
pidProcess ID
statProcess state
commCommand name (executable)

This is extremely useful for focused troubleshooting.


3️⃣ Output Columns (Explained in Depth)

🔸 PID — Process ID

Example:

24567
  • Unique identifier for a process
  • Assigned by the Linux kernel
  • Required to manage or inspect processes

Used in commands like:

kill 24567
strace -p 24567
cat /proc/24567/status

📌 Notes:

  • PID 1 is always the init/systemd process
  • PIDs are reused after processes exit

🔸 STAT — Process State (most important field)

The STAT column shows:

  1. Main execution state
  2. Additional flags

Primary states

CodeMeaning
RRunning or runnable (on CPU or ready)
SSleeping (waiting for event)
DUninterruptible sleep (I/O wait)
TStopped (signal or debugger)
ZZombie (dead, not cleaned up)
IIdle kernel thread (newer kernels)

👉 The first letter is the core state.


Modifier flags (can appear after the main letter)

FlagMeaning
sSession leader
lMultithreaded (uses threads)
+Foreground process
<High priority
NLow priority

STAT examples explained

Ss
  • S → sleeping
  • s → session leader
    ✅ Normal background service
Ssl+
  • Sleeping
  • Session leader
  • Multithreaded
  • Foreground task
    ✅ Common for DB or Java processes
D

🚨 Critical

  • Process waiting on kernel I/O
  • Cannot be killed (even kill -9)
  • Usually due to:
    • Disk I/O
    • NFS
    • SAN / ASM
    • Kernel storage issue

Examples

Ss

→ Sleeping, session leader

D

→ Blocked on I/O (disk, NFS, storage). Very important state

Ssl+

→ Sleeping, session leader, multithreaded, foreground job

📌 Critical note
If a process is in D state, it:

  • Cannot be killed (kill -9 won’t work)
  • Is usually waiting on disk, SAN, ASM, or NFS
  • Indicates storage or kernel-level issues


🔸 COMMAND — Executable Name

Example:

oracle
sshd
ora_w00l
  • Shows only the binary name
  • Does NOT include command‑line arguments

For full command line:

ps -eo pid,stat,cmd

📌 Oracle example:

ora_w00l

Means:

  • ora_ → Oracle process
  • w00l → Parallel/worker process

4️⃣ Sample Output and Interpretation

PID STAT COMMAND
1 Ss systemd
1023 Ssl oracle
2045 D ora_dbw0

How to read this:

  • systemd → sleeping session leader (normal)
  • oracle → sleeping, multithreaded (normal)
  • ora_dbw0D state (problem)
    → Indicates disk or ASM issue

5️⃣ Why this command is widely used

✅ Lightweight and fast

  • No interactive overhead
  • Safe on production systems

✅ Perfect for troubleshooting

  • Detects:
    • Hung processes
    • Storage stalls
    • Zombie accumulation
    • Oracle background issues

✅ Script‑friendly

Used inside:

  • Shell scripts
  • Health checks
  • Cron jobs

6️⃣ Common Enhancements

Show only blocked (D) processes

ps -eo pid,stat,comm | awk '$2 ~ /D/'


Sort by process state

ps -eo pid,stat,comm --sort=stat

Add user and CPU usage

ps -eo user,pid,stat,%cpu,%mem,comm


7️⃣ Practical Use Case (Oracle / DB servers)

DBAs frequently use:

ps -eo pid,stat,comm | grep ora_

To detect:

  • Stuck background workers
  • DBWR/LGWR waiting on disk
  • Parallel query stalls

If many ora_* processes show D: 🚨 Storage team must be involved immediately


✅ Final Summary

ComponentPurpose
psShow process snapshot
-eInclude all processes
-oCustomize output
pidProcess identifier
statExecution + wait state
commExecutable name

🎯 Key troubleshooting signal

  • R, S → Normal
  • DI/O or kernel problem
  • Z → Parent process issue


  • PID → Unique process identifier
  • STAT → Current state + extra flags (critical for troubleshooting)
  • COMMAND → Executable name

🎯 For troubleshooting:

  • R / S → Normal
  • DInvestigate immediately
  • Z → Parent process issue

Oracle Database Disk Storage Slowness Troubleshooting (RHEL) - I/O issue

 

Oracle Database Disk Storage Slowness Troubleshooting (RHEL)




Command :
ps -eo pid,stat,comm | grep D

Meaning

  • ps -e → show all processes
  • -o pid,stat,comm → display:
    • pid → process ID
    • stat → process state
    • comm → command name
  • grep D → filter processes whose STAT column contains D

What D means

D = Uninterruptible sleep
This usually means the process is:

  • Waiting on I/O
  • Typically stuck on disk, NFS, SAN, or kernel I/O
  • Cannot be killed (kill -9 won’t work) until the I/O returns

This is often serious on production systems.


iostat -xz 1 5


Ss

This shows the Oracle process state at the time of capture:

  • S = sleeping
  • s = secondary sleep state

So the process was waiting (idle or blocked), not crashing at that exact moment.



1. Typical Symptoms (What triggers investigation)

  • High load average on DB server
  • User complaints: slow queries, commits, batch delays
  • AWR shows:
    • db file sequential read
    • db file scattered read
    • log file sync
    • log file parallel write
  • OS metrics show high IO wait (%wa)
  • RMAN / backups running slow

2. Step‑1: Validate System Load & CPU Wait

✅ Identify load average vs cores

uptime
nproc

Interpretation

  • Load ≈ number of CPU cores → OK
  • Load >> cores + high IO wait → likely disk bottleneck

✅ Check CPU & IO wait

top

or (better)

vmstat 1 10

Look for:

  • %wa (IO wait) consistently > 15–20%
  • Low %id while CPUs are idle but blocked

Example

r  b   swpd   free  buff cache   si so bi bo   in   cs us sy id wa st
8  12     0   812M  122M  18G     0  0  45 620  900 1200 10  6 40 44 0

➡️ High b and wa = blocked on disk


3. Step‑2: Disk Latency at OS Level (Most Important)

✅ iostat – PRIMARY disk latency tool

iostat -xm 1 10

Key columns:

MetricMeaningProblem Threshold
r_awaitRead latency> 20 ms (OLTP), > 50 ms (DW)
w_awaitWrite latency> 10–15 ms
awaitAvg IO latency> 20 ms
%utilDisk busy> 80–90% sustained
aqu-szAvg queue sizeGrowing steadily = queueing

Example (Bad)

Device:  r/s   w/s  r_await  w_await  await  aqu-sz %util
sdb      420   350   48.12    32.22    40.01   18.3   97.4

➡️ Storage saturation confirmed


4. Step‑3: Identify Which Filesystems / Disks

✅ Map disks → mount points

df -hT
lsblk -f

✅ Per‑filesystem IO usage

iostat -xm 1 10 | grep -E "sd|nvme"

Check:

  • Datafiles disk
  • Redo log disk
  • FRA disk
  • Temp disk

5. Step‑4: Per‑Process Confirmation (Oracle vs others)

✅ pidstat – correlate Oracle background processes

pidstat -d 1 10 | grep ora_

Key offenders:

  • ora_dbw* → datafile writes
  • ora_lgwr → redo log writes
  • ora_ckpt
  • RMAN channels

High KB/s + delays = database IO bottleneck


6. Step‑5: Advanced Disk & Queue Observation

✅ sar (historical if available)

sar -d 1 5

✅ IO pressure (RHEL 8+)

cat /proc/pressure/io

If avg10 and avg60 > 10–20 → sustained storage pressure


7. Step‑6: Oracle Database Wait Event Validation

✅ Top waits (Instance level)

SELECT event, total_waits, time_waited/100 AS time_waited_sec
FROM v$system_event
WHERE event LIKE 'db file%'
OR event LIKE 'log file%'
ORDER BY time_waited DESC;


✅ Real‑time waits (active sessions)

SELECT sid, event, wait_time, seconds_in_wait
FROM v$session
WHERE wait_class = 'User I/O'
ORDER BY seconds_in_wait DESC;


8. Step‑7: File Type & Latency inside Oracle

✅ File-level IO latency

SELECT df.name,
fs.phyrds,
fs.phywrts,
fs.readtim/1000 AS read_sec,
fs.writetim/1000 AS write_sec
FROM v$datafile df, v$filestat fs
WHERE df.file# = fs.file#
ORDER BY fs.readtim DESC;

✅ Tablespace hotspot

SELECT tablespace_name,
SUM(physical_reads) reads,
SUM(physical_writes) writes
FROM v$segment_statistics
GROUP BY tablespace_name
ORDER BY reads DESC;

9. Step‑8: Redo Log Latency (Very Common OLTP Issue)

✅ LGWR wait

SELECT event, total_waits, time_waited/100 AS time_waited_sec
FROM v$system_event
WHERE event IN ('log file sync','log file parallel write');

Interpretation

  • log file sync waits high → commit delayed
  • log file parallel write high → redo disk slow

✅ Validate redo disks with:

iostat -xm 1 10 <redo_disk>


10. Step‑9: ASM (If Applicable)

✅ ASM disk stats

SELECT name, total_mb, free_mb, read_errs, write_errs
FROM v$asm_disk;

✅ ASM IO latency

SELECT dg.name, fs.*
FROM v$asm_diskgroup dg, v$asm_disk_iostat fs
WHERE dg.group_number = fs.group_number;


11. Correlation Checklist (OS ⇄ Oracle)

Disk problem confirmed if ALL match

  • High %wa in vmstat
  • High await in iostat
  • High db file* or log file* waits in Oracle
  • %util near 100% on affected disks
  • Load average high but CPU idle present

12. Common Root Causes

CauseHow it Appears
Storage array saturationHigh await + util
Poor redo disklog file sync waits
Temp spillsdb file scattered read
RMAN / backupDBWn writes spike
Thin provisioningLatency spikes under load
Too many LUNs on same backendRandom latency

13. Immediate Mitigations

✅ Short‑term:

  • Pause backups / RMAN
  • Kill runaway sessions
  • Reduce parallelism
  • Move redo logs to faster disks

✅ Medium‑term:

  • Separate redo, data, temp
  • Increase redo log size
  • Add disks / IOPS
  • ASM rebalance / re‑stripe

✅ Long‑term:

  • Storage tiering (NVMe for redo)
  • Oracle I/O calibration (ORION)
  • Capacity & growth planning

14. One‑Command Quick Triage Bundle

uptime
vmstat 1 5
iostat -xm 1 5
pidstat -d 1 5 | grep ora_

Then Oracle

SELECT event, time_waited/100 AS wait_sec
FROM v$system_event
ORDER BY time_waited DESC;



15. Key Rule of Thumb (Production Oracle)

IO TypeAcceptable Latency
Redo writes< 5 ms
OLTP reads< 10–15 ms
Mixed workload< 20 ms
Anything > 30 msProblem

Sunday, April 26, 2026

list of popular extensions for python developer - vscode extension

Python developers using Visual Studio Code, the following extensions are essential for improving productivity, code quality, and the overall development experience: 


Core Development & Language Support

Python by Microsoft: The foundational extension for Python development, providing core features like linting, debugging, code navigation, and environment switching.


Pylance: A high-performance language server that offers advanced IntelliSense, rapid type checking, and auto-imports.

Python Debugger: A dedicated extension for a seamless debugging experience, allowing you to set breakpoints and inspect variables directly in the editor. 



Code Formatting & Quality

Black Formatter: Automatically formats your code to adhere to the PEP 8 style guide on every save.


Flake8: A widely used linter that identifies syntax errors and styling issues as you type.


Ruff: A modern, extremely fast Python linter and formatter that can replace both Flake8 and Black.


autoDocstring: Quickly generates standardized docstrings (Google, NumPy, Sphinx) by typing triple quotes after a function definition. 



Productivity & UI Enhancements

Jupyter: Enables running Jupyter Notebooks within VS Code, ideal for data science and rapid prototyping.


Python Test Explorer: Provides a visual interface to manage and run tests using pytest or unittest frameworks.


Python Indent: Automatically determines the correct indentation level for each line, reducing common indentation errors.


Sort Lines: Helps organize long lists of imports or variables into alphabetical order with a simple command. 



AI & Collaboration

GitHub Copilot: An AI-powered tool that provides real-time code suggestions and entire function completions.


Visual Studio IntelliCode: Enhances autocomplete by using AI to predict and recommend the most likely code snippets based on your context.


Live Share: Allows multiple developers to collaborate on the same codebase in real-time, which is useful for pair programming. 



General Utility

Docker: Simplifies building, managing, and deploying containerized Python applications directly from the editor.


GitLens: Supercharges Git capabilities, providing detailed commit history and line-by-line blame annotations.


Code Spell Checker: Catches typos in variable names and comments, which is vital for maintaining a professional codebase. 

practical, popular, and well‑sequenced travel plan from Gurgaon / Delhi NCR to Dharamshala & McLeod Ganj

  How to Reach (Most Popular & Practical) Best Options from Delhi NCR Overnight Volvo Bus (Most common) Boarding: Majnu Ka Tila / ISBT K...