🔗 1. Map dm-* → Actual Mount Points (VERY IMPORTANT)
✅ Command:
lsblk -o NAME,KNAME,MOUNTPOINT,SIZE,FSTYPE | grep dm-
✅ If using LVM:
dmsetup ls --tree
✅ Detailed mapping:
lsblk -o NAME,KNAME,PKNAME,MOUNTPOINT | column -t
✅ Correlate with filesystem:
df -h | grep /dev/mapper
👉 Why this matters
dm-62→ logical volume → mount point → DB datafile location- Helps answer:
“Which tablespace is causing this spike?”
🔗 2. Map Disk → Oracle / DB Files
✅ For Oracle:
SELECT file_name, tablespace_name
FROM dba_data_files
WHERE file_name LIKE '%<mount_point>%';
✅ Check temp / redo:
SELECT name FROM v$tempfile;
SELECT member FROM v$logfile;
👉 Now you can map:
dm-69 → /u02 → USERS tablespace → full scan
📊 3. iostat → DB Wait Event Mapping
| iostat Pattern | DB Wait Event | Meaning |
|---|---|---|
| High rMB/s + large avgrq-sz | db file scattered read | Full table scan |
| High r/s small IO | db file sequential read | Index lookup |
| High w_await | log file sync | Commit latency |
| High avgqu-sz | free buffer waits | DB buffer pressure |
| High await | Any IO wait | Storage slow |
🚨 4. Alert Thresholds (Production Standard)
✅ Disk Health Thresholds
| Metric | Warning | Critical |
|---|---|---|
| %util | >80% | >90% |
| await | >20 ms | >50 ms |
| avgqu-sz | >5 | >10 |
| svctm vs await gap | noticeable | large gap |
✅ Quick Alert Command
iostat -xm 2 5 | awk '
/Device/ {print; next}
$1 ~ /^(sd|dm)/ && ($NF > 90 || $10 > 50 || $9 > 10) {print}
'
👉 Triggers when:
- %util > 90
- await > 50 ms
- queue > 10
📈 5. Real-Time Monitoring Script (Reusable)
✅ Save as disk_monitor.sh
#!/bin/bash
echo "==== Disk Bottleneck Check ===="
date
iostat -xm 2 3 | awk '
/Device/ {print; next}
$1 ~ /^(sd|dm)/ && ($NF > 90 || $10 > 50 || $9 > 10) {
printf "ALERT: %-8s util=%s%% await=%sms queue=%s\n", $1, $NF, $10, $9
}'
✅ Run:
chmod +x disk_monitor.sh
./disk_monitor.sh
🔍 6. Identify Top IO Consumers
✅ Process level:
iotop -oP
✅ File level:
lsof | grep <device>
✅ Per process IO:
pidstat -d 2
🧠 7. Advanced DBA Analysis Flow
When you see:
%util = 100
await = high
avgqu-sz = high
✅ Follow this sequence:
- Identify disk
- Map to mount point
- Map to DB file
- Identify SQL causing IO
- Check execution plan
⚡ 8. Quick Root Cause Patterns
🔴 Pattern 1 (Your case earlier)
avgrq-sz ~ 1024 KB
await ~ 80–100 ms
👉 Cause:
- Full table scans
- Data warehouse queries
🟢 Pattern 2
r/s high + avgrq-sz small + low await
👉 Cause:
- OLTP workload (healthy)
🔴 Pattern 3
w_await high
👉 Cause:
- Commit issues
- Log sync bottleneck
🎯 9. What You Should Do Immediately (From Your Data)
Based on your earlier output:
🚨 Critical disks:
- dm-69
- dm-275
✅ Action Plan:
- Map these disks → mount point
- Identify DB objects
- Run:
SELECT sql_id, executions, disk_reads
FROM v$sql
ORDER BY disk_reads DESC FETCH FIRST 10 ROWS ONLY;
✅ Final Takeaway
You now have:
✅ Disk → Mount → DB mapping
✅ Alert thresholds
✅ Real-time monitoring script
✅ DB wait correlation
✅ Troubleshooting workflow
No comments:
Post a Comment