Tuesday, May 26, 2026

Map Actual storage disk mount point to troubleshoot the storage related performance issue

 

🔗 1. Map dm-* → Actual Mount Points (VERY IMPORTANT)

✅ Command:

lsblk -o NAME,KNAME,MOUNTPOINT,SIZE,FSTYPE | grep dm-

✅ If using LVM:

dmsetup ls --tree

✅ Detailed mapping:

lsblk -o NAME,KNAME,PKNAME,MOUNTPOINT | column -t

✅ Correlate with filesystem:

df -h | grep /dev/mapper

👉 Why this matters

  • dm-62 → logical volume → mount point → DB datafile location
  • Helps answer:

    “Which tablespace is causing this spike?”


🔗 2. Map Disk → Oracle / DB Files

✅ For Oracle:

SELECT file_name, tablespace_name
FROM dba_data_files
WHERE file_name LIKE '%<mount_point>%';

✅ Check temp / redo:

SELECT name FROM v$tempfile;
SELECT member FROM v$logfile;

👉 Now you can map:

dm-69 → /u02 → USERS tablespace → full scan

📊 3. iostat → DB Wait Event Mapping

iostat PatternDB Wait EventMeaning
High rMB/s + large avgrq-szdb file scattered readFull table scan
High r/s small IOdb file sequential readIndex lookup
High w_awaitlog file syncCommit latency
High avgqu-szfree buffer waitsDB buffer pressure
High awaitAny IO waitStorage slow

🚨 4. Alert Thresholds (Production Standard)

✅ Disk Health Thresholds

MetricWarningCritical
%util>80%>90%
await>20 ms>50 ms
avgqu-sz>5>10
svctm vs await gapnoticeablelarge gap

✅ Quick Alert Command

iostat -xm 2 5 | awk '
/Device/ {print; next}
$1 ~ /^(sd|dm)/ && ($NF > 90 || $10 > 50 || $9 > 10) {print}
'

👉 Triggers when:

  • %util > 90
  • await > 50 ms
  • queue > 10

📈 5. Real-Time Monitoring Script (Reusable)

✅ Save as disk_monitor.sh

#!/bin/bash

echo "==== Disk Bottleneck Check ===="
date

iostat -xm 2 3 | awk '
/Device/ {print; next}
$1 ~ /^(sd|dm)/ && ($NF > 90 || $10 > 50 || $9 > 10) {
printf "ALERT: %-8s util=%s%% await=%sms queue=%s\n", $1, $NF, $10, $9
}'


✅ Run:


chmod +x disk_monitor.sh
./disk_monitor.sh


🔍 6. Identify Top IO Consumers

✅ Process level:

iotop -oP

✅ File level:

lsof | grep <device>

✅ Per process IO:

pidstat -d 2

🧠 7. Advanced DBA Analysis Flow

When you see:

%util = 100
await = high
avgqu-sz = high

✅ Follow this sequence:

  1. Identify disk
  2. Map to mount point
  3. Map to DB file
  4. Identify SQL causing IO
  5. Check execution plan

⚡ 8. Quick Root Cause Patterns

🔴 Pattern 1 (Your case earlier)

avgrq-sz ~ 1024 KB
await ~ 80–100 ms

👉 Cause:

  • Full table scans
  • Data warehouse queries

🟢 Pattern 2

r/s high + avgrq-sz small + low await

👉 Cause:

  • OLTP workload (healthy)

🔴 Pattern 3

w_await high

👉 Cause:

  • Commit issues
  • Log sync bottleneck

🎯 9. What You Should Do Immediately (From Your Data)

Based on your earlier output:

🚨 Critical disks:

  • dm-69
  • dm-275

✅ Action Plan:

  1. Map these disks → mount point
  2. Identify DB objects
  3. Run:
SELECT sql_id, executions, disk_reads
FROM v$sql
ORDER BY disk_reads DESC FETCH FIRST 10 ROWS ONLY;


✅ Final Takeaway

You now have:

✅ Disk → Mount → DB mapping
✅ Alert thresholds
✅ Real-time monitoring script
✅ DB wait correlation
✅ Troubleshooting workflow

No comments:

Post a Comment

Is CPU issue ? troubleshooting workflow for oracle database performance issue with automation

✅ ✅ 1. CPU Troubleshooting Framework (Like iostat for CPU) Use: vmstat 2 5 or top 📊 ✅ 2. CPU Metrics Explained (vmstat / top) us sy id wa s...