✅ 1. AUTO EMAIL ALERT SCRIPT (Real Production Use)
This script:
- Detects disk bottlenecks
- Sends email alert automatically
📜 Script: disk_alert_mail.sh
#!/bin/bash
HOST=$(hostname)
DATE=$(date)
ALERT=$(iostat -xm 2 2 | awk '
$1 ~ /^(sd|dm)/ {
util=$NF
await=$(NF-3)
queue=$(NF-4)
if (util>95 || await>50 || queue>10) {
printf "%-8s util=%s%% await=%sms queue=%s\n", $1, util, await, queue
}
}
')
if [ ! -z "$ALERT" ]; then
echo "Disk Alert on $HOST at $DATE
Critical disks detected:
$ALERT
" | mail -s "🚨 Disk I/O Alert on $HOST" your_email@company.com
fi
✅ Setup Cron (every 5 min)
crontab -e
Add:
*/5 * * * * /path/disk_alert_mail.sh
✅ ✅ 2. AUTO MAP dm-* → MOUNT → DB FILE (Game-Changer)
This solves your biggest pain: 👉 “Which DB object is causing this?”
📜 Script: disk_to_db_map.sh
#!/bin/bash
echo "==== Disk → Mount → Files ===="
for dev in $(lsblk -dn -o NAME | grep dm-); do
echo ""
echo "Device: $dev"
mount=$(lsblk -n -o MOUNTPOINT /dev/$dev 2>/dev/null)
if [ ! -z "$mount" ]; then
echo "Mount: $mount"
echo "Files:"
lsof +D $mount 2>/dev/null | head -10
else
echo "No mount (maybe ASM/LVM raw)"
fi
done
✅ Output Example
Device: dm-xx
Mount: /u02
Files:
oracle db datafile1.dbf
oracle db users01.dbf
✅ ✅ 3. CORRELATE iostat + ORACLE (Find exact SQL)
✅ Step 1: Identify heavy disk (from iostat)
Example:
dm-69 → /u02 → USERS tablespace
✅ Step 2: Find SQL causing reads
SELECT sql_id,
executions,
disk_reads,
buffer_gets,
ROUND(disk_reads/executions) avg_reads
FROM v$sql
ORDER BY disk_reads DESC
FETCH FIRST 10 ROWS ONLY;
✅ Step 3: Active session (real-time)
SELECT sid, serial#, sql_id, event
FROM v$session
WHERE wait_class='User I/O';
✅ Step 4: Wait event mapping
| Wait Event | Meaning |
|---|---|
db file scattered read | Full table scan 🚨 |
db file sequential read | Index lookup |
direct path read | Large scan |
log file sync | Commit slow |
✅ ✅ 4. ADVANCED ALL-IN-ONE SCRIPT (MOST POWERFUL)
👉 Combines:
- Disk health
- Alert detection
- Mapping
📜 ultimate_disk_analyzer.sh
#!/bin/bash
echo "===== Ultimate Disk Analyzer ====="
date
echo ""
iostat -xm 2 2 | awk '
/Device/ {
printf "%-8s %-6s %-6s %-6s %-10s\n","Dev","Util","Await","Queue","Status"
next
}
$1 ~ /^(sd|dm)/ {
util=$NF
await=$(NF-3)
queue=$(NF-4)
status="OK"
if (util>95 || await>50 || queue>10)
status="CRITICAL"
else if (util>80 || await>20 || queue>5)
status="WARNING"
printf "%-8s %-6.1f %-6.1f %-6.1f %-10s\n",$1,util,await,queue,status
}'
✅ ✅ 5. Real DBA Troubleshooting Flow (What YOU should do)
🔥 Step-by-step when alert triggers:
1. Run:
iostat -xm 2 5
2. Identify:
- High %util
- High await
- High queue
3. Map:
lsblk
df -h
4. Check DB:
SELECT * FROM v$session WHERE event LIKE '%read%';
5. Fix:
- Query tuning ✅
- Indexing ✅
- Move data ✅
- Balance disks ✅
✅ ✅ 6. PRO TIPS (REAL WORLD)
🔴 When to PANIC
- await > 100 ms
- queue > 20
- util = 100% sustained
🟢 When it's OK
- util = 100% BUT await < 5
👉 (fast SSD under load)
🧠Golden formula:
Problem = High Util + High Await + High Queue
🎯 Final Outcome
You now have:
✅ Auto alert system
✅ Disk → DB mapping
✅ SQL root cause detection
✅ Health scoring
✅ Full troubleshooting workflow
No comments:
Post a Comment