Tuesday, May 26, 2026

Is CPU issue ? troubleshooting workflow for oracle database performance issue with automation

✅ ✅ 1. CPU Troubleshooting Framework (Like iostat for CPU)

Use:

vmstat 2 5

or

top


📊 ✅ 2. CPU Metrics Explained (vmstat / top)

us sy id wa st
MetricMeaning
usUser CPU (app/DB queries)
sySystem CPU (kernel/syscalls)
idIdle CPU
waIO wait
stSteal (VM contention)

✅ ✅ 3. CPU Baseline Table (Production Standard)

📊 CPU Utilization

MetricGood ✅Warning ⚠️Critical 🚨
idle (id)> 40%20–40%< 20%
user (us)< 50%50–70%> 70%
system (sy)< 20%20–30%> 30%
iowait (wa)< 5%5–15%> 20%
steal (st)01–5> 5

📊 Run Queue (Very Important)

r (run queue)
MetricGood ✅Warning ⚠️Critical 🚨
r vs CPU cores≤ cores1.5x cores2x+ cores

✅ ✅ 4. CPU Health Decision Matrix

ConditionMeaning
High usCPU-heavy queries
High sykernel/system overhead
High wanot CPU → disk issue
High rCPU contention
High stVM issue

✅ ✅ 5. CPU Scoring Model (Like Disk)

🎯 Formula

Score = 100 
- user penalty 
- system penalty 
- run queue penalty 
- iowait penalty

📊 Penalties

✅ User CPU

  • <50 → 0
  • 50–70 → -15
  • 70 → -30


✅ System CPU

  • <20 → 0
  • 20–30 → -15
  • 30 → -30


✅ Run Queue

  • ≤ cores → 0
  • 1–2x cores → -20
  • 2x cores → -40


✅ IO Wait

  • <5 → 0
  • 5–15 → -20
  • 15 → -40


✅ Final Status

ScoreStatus
80–100✅ Healthy
60–80⚠️ Warning
40–60🔶 Degraded
<40🚨 Critical

✅ ✅ 6. CPU Monitoring Script (PRODUCTION READY)

📜 cpu_health.sh


#!/bin/bash

echo "===== CPU Health Check ====="
date

vmstat 2 3 | tail -1 | awk '
{
r=$1
us=$13
sy=$14
id=$15
wa=$16
st=$17

score=100

# user penalty
if (us > 70) score -= 30
else if (us > 50) score -= 15

# system penalty
if (sy > 30) score -= 30
else if (sy > 20) score -= 15

# iowait penalty
if (wa > 15) score -= 40
else if (wa > 5) score -= 20

# status
status="HEALTHY"
if (score < 40) status="CRITICAL"
else if (score < 60) status="DEGRADED"
else if (score < 80) status="WARNING"

printf "RunQ=%d User=%d%% Sys=%d%% Idle=%d%% IOwait=%d%% Status=%s Score=%d\n",r,us,sy,id,wa,status,score
}
'

Identify CPU Bottleneck (Root Cause)


✅ Step 1: Check top processes

top -o %CPU

✅ Step 2: Per process CPU

ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu | head

✅ Step 3: Thread-level (VERY IMPORTANT for DB)

top -H

✅ Step 4: Per process breakdown

pidstat -u 2

✅ ✅ 8. CPU Issue Patterns (DBA Mapping)


🔴 Pattern 1: High USER CPU

us > 70%

👉 Cause:

  • Complex SQL
  • Full scans
  • bad execution plans

✅ DB Mapping:

SELECT sql_id, cpu_time
FROM v$sql
ORDER BY cpu_time DESC FETCH FIRST 10 ROWS;


🔴 Pattern 2: High SYSTEM CPU

sy > 30%

👉 Cause:

  • excessive context switching
  • kernel overhead
  • I/O interrupts

✅ Check:

vmstat 1

Look at:

cs (context switches)
in (interrupts)

🔴 Pattern 3: High RUN QUEUE

r >> CPU cores

👉 Cause:

  • CPU contention
  • too many concurrent queries

🔴 Pattern 4: High IO WAIT

wa > 20%

👉 Cause:

  • NOT CPU problem
    👉 disk bottleneck (go back to iostat)

🔴 Pattern 5: High STEAL (VM)

st > 5%

👉 Cause:

  • host contention (cloud/VM issue)

✅ ✅ 9. CPU Alert Script (Automation)

📜 cpu_alert.sh

#!/bin/bash

HOST=$(hostname)

ALERT=$(vmstat 2 2 | tail -1 | awk '
{
us=$13; sy=$14; wa=$16; st=$17

if (us>70 || sy>30 || wa>15 || st>5)
print "CPU Issue: user="us"% sys="sy"% wa="wa"% st="st"%"
}
')

if [ ! -z "$ALERT" ]; then
echo "$ALERT" | mail -s "CPU Alert on $HOST" your_email@company.com
fi
``

✅ ✅ 10. DBA Troubleshooting Flow


🔥 When CPU alert triggers:

1. Check:

vmstat 1 5

2. Identify pattern:

  • High us → SQL issue
  • High sy → OS/kernel
  • High wa → disk issue

3. Identify process:

top

4. Map to DB:

SELECT sql_id, cpu_time FROM v$sql ORDER BY cpu_time DESC;

5. Fix:

  • Tune query ✅
  • Index ✅
  • Reduce parallelism ✅
  • Limit sessions ✅

✅ ✅ 11. Golden Rules (CPU)

✅ Healthy

idle > 40%
wa < 5%
r <= cores

🚨 Critical CPU

idle < 10%
us > 70%
r > 2x cores

⚠️ Trick Case (VERY IMPORTANT)

CPU looks high BUT wa is high
👉 NOT CPU issue → DISK issue

🎯 Final Outcome

Now you have:

✅ CPU baseline table
✅ Health scoring model
✅ Monitoring script
✅ Alerting system
✅ DB correlation
✅ Troubleshooting flow

No comments:

Post a Comment

Is CPU issue ? troubleshooting workflow for oracle database performance issue with automation

✅ ✅ 1. CPU Troubleshooting Framework (Like iostat for CPU) Use: vmstat 2 5 or top 📊 ✅ 2. CPU Metrics Explained (vmstat / top) us sy id wa s...