Tuesday, May 26, 2026

Is storage issue ? troubleshooting workflow for oracle database performance issue with automation

 

✅ 1. AUTO EMAIL ALERT SCRIPT (Real Production Use)

This script:

  • Detects disk bottlenecks
  • Sends email alert automatically

📜 Script: disk_alert_mail.sh

#!/bin/bash

HOST=$(hostname)
DATE=$(date)

ALERT=$(iostat -xm 2 2 | awk '
$1 ~ /^(sd|dm)/ {
util=$NF
await=$(NF-3)
queue=$(NF-4)

if (util>95 || await>50 || queue>10) {
printf "%-8s util=%s%% await=%sms queue=%s\n", $1, util, await, queue
}
}
')

if [ ! -z "$ALERT" ]; then
echo "Disk Alert on $HOST at $DATE

Critical disks detected:

$ALERT
" | mail -s "🚨 Disk I/O Alert on $HOST" your_email@company.com
fi


✅ Setup Cron (every 5 min)

crontab -e

Add:

*/5 * * * * /path/disk_alert_mail.sh


✅ ✅ 2. AUTO MAP dm-* → MOUNT → DB FILE (Game-Changer)

This solves your biggest pain: 👉 “Which DB object is causing this?”


📜 Script: disk_to_db_map.sh

#!/bin/bash

echo "==== Disk → Mount → Files ===="

for dev in $(lsblk -dn -o NAME | grep dm-); do
echo ""
echo "Device: $dev"

mount=$(lsblk -n -o MOUNTPOINT /dev/$dev 2>/dev/null)

if [ ! -z "$mount" ]; then
echo "Mount: $mount"
echo "Files:"
lsof +D $mount 2>/dev/null | head -10
else
echo "No mount (maybe ASM/LVM raw)"
fi
done


✅ Output Example

Device: dm-xx
Mount: /u02
Files:
oracle  db  datafile1.dbf
oracle  db  users01.dbf

✅ ✅ 3. CORRELATE iostat + ORACLE (Find exact SQL)


✅ Step 1: Identify heavy disk (from iostat)

Example:

dm-69 → /u02 → USERS tablespace

✅ Step 2: Find SQL causing reads

SELECT sql_id,
executions,
disk_reads,
buffer_gets,
ROUND(disk_reads/executions) avg_reads
FROM v$sql
ORDER BY disk_reads DESC
FETCH FIRST 10 ROWS ONLY;


✅ Step 3: Active session (real-time)

SELECT sid, serial#, sql_id, event
FROM v$session
WHERE wait_class='User I/O';


✅ Step 4: Wait event mapping

Wait EventMeaning
db file scattered readFull table scan 🚨
db file sequential readIndex lookup
direct path readLarge scan
log file syncCommit slow

✅ ✅ 4. ADVANCED ALL-IN-ONE SCRIPT (MOST POWERFUL)

👉 Combines:

  • Disk health
  • Alert detection
  • Mapping

📜 ultimate_disk_analyzer.sh

#!/bin/bash

echo "===== Ultimate Disk Analyzer ====="
date
echo ""

iostat -xm 2 2 | awk '
/Device/ {
printf "%-8s %-6s %-6s %-6s %-10s\n","Dev","Util","Await","Queue","Status"
next
}

$1 ~ /^(sd|dm)/ {
util=$NF
await=$(NF-3)
queue=$(NF-4)

status="OK"

if (util>95 || await>50 || queue>10)
status="CRITICAL"
else if (util>80 || await>20 || queue>5)
status="WARNING"

printf "%-8s %-6.1f %-6.1f %-6.1f %-10s\n",$1,util,await,queue,status
}'


✅ ✅ 5. Real DBA Troubleshooting Flow (What YOU should do)


🔥 Step-by-step when alert triggers:

1. Run:

iostat -xm 2 5

2. Identify:

  • High %util
  • High await
  • High queue

3. Map:

lsblk
df -h

4. Check DB:

SELECT * FROM v$session WHERE event LIKE '%read%';


5. Fix:

  • Query tuning ✅
  • Indexing ✅
  • Move data ✅
  • Balance disks ✅

✅ ✅ 6. PRO TIPS (REAL WORLD)

🔴 When to PANIC

  • await > 100 ms
  • queue > 20
  • util = 100% sustained

🟢 When it's OK

  • util = 100% BUT await < 5
    👉 (fast SSD under load)

🧠 Golden formula:

Problem = High Util + High Await + High Queue

🎯 Final Outcome

You now have:

✅ Auto alert system
✅ Disk → DB mapping
✅ SQL root cause detection
✅ Health scoring
✅ Full troubleshooting workflow

No comments:

Post a Comment

Is CPU issue ? troubleshooting workflow for oracle database performance issue with automation

✅ ✅ 1. CPU Troubleshooting Framework (Like iostat for CPU) Use: vmstat 2 5 or top 📊 ✅ 2. CPU Metrics Explained (vmstat / top) us sy id wa s...