Tuesday, May 26, 2026

Map Actual storage disk mount point to troubleshoot the storage related performance issue

🔗 1. Map `dm-*` → Actual Mount Points (VERY IMPORTANT)

✅ Command:

lsblk -o NAME,KNAME,MOUNTPOINT,SIZE,FSTYPE | grep dm-

✅ If using LVM:

dmsetup ls --tree

✅ Detailed mapping:

lsblk -o NAME,KNAME,PKNAME,MOUNTPOINT | column -t

✅ Correlate with filesystem:

df -h | grep /dev/mapper

👉 Why this matters

dm-62 → logical volume → mount point → DB datafile location
Helps answer:
“Which tablespace is causing this spike?”

🔗 2. Map Disk → Oracle / DB Files

✅ For Oracle:

SELECT file_name, tablespace_name
FROM dba_data_files
WHERE file_name LIKE '%<mount_point>%';

✅ Check temp / redo:

SELECT name FROM v$tempfile;

SELECT member FROM v$logfile;

👉 Now you can map:

dm-69 → /u02 → USERS tablespace → full scan

📊 3. iostat → DB Wait Event Mapping

iostat Pattern	DB Wait Event	Meaning
High rMB/s + large avgrq-sz	`db file scattered read`	Full table scan
High r/s small IO	`db file sequential read`	Index lookup
High w_await	`log file sync`	Commit latency
High avgqu-sz	`free buffer waits`	DB buffer pressure
High await	Any IO wait	Storage slow

🚨 4. Alert Thresholds (Production Standard)

✅ Disk Health Thresholds

Metric	Warning	Critical
%util	>80%	>90%
await	>20 ms	>50 ms
avgqu-sz	>5	>10
svctm vs await gap	noticeable	large gap

✅ Quick Alert Command

iostat -xm 2 5 | awk '
/Device/ {print; next}
$1 ~ /^(sd|dm)/ && ($NF > 90 || $10 > 50 || $9 > 10) {print}

👉 Triggers when:

%util > 90
await > 50 ms
queue > 10

📈 5. Real-Time Monitoring Script (Reusable)

✅ Save as `disk_monitor.sh`

#!/bin/bash
echo "==== Disk Bottleneck Check ===="
iostat -xm 2 3 | awk '
/Device/ {print; next}
$1 ~ /^(sd|dm)/ && ($NF > 90 || $10 > 50 || $9 > 10) {
printf "ALERT: %-8s util=%s%% await=%sms queue=%s\n", $1, $NF, $10, $9

✅ Run:

chmod +x disk_monitor.sh

./disk_monitor.sh

🔍 6. Identify Top IO Consumers

✅ Process level:

iotop -oP

✅ File level:

lsof | grep <device>

✅ Per process IO:

pidstat -d 2

🧠 7. Advanced DBA Analysis Flow

When you see:

%util = 100
await = high
avgqu-sz = high

✅ Follow this sequence:

Identify disk
Map to mount point
Map to DB file
Identify SQL causing IO
Check execution plan

⚡ 8. Quick Root Cause Patterns

🔴 Pattern 1 (Your case earlier)

avgrq-sz ~ 1024 KB
await ~ 80–100 ms

👉 Cause:

Full table scans
Data warehouse queries

🟢 Pattern 2

r/s high + avgrq-sz small + low await

👉 Cause:

OLTP workload (healthy)

🔴 Pattern 3

w_await high

👉 Cause:

Commit issues
Log sync bottleneck

🎯 9. What You Should Do Immediately (From Your Data)

Based on your earlier output:

🚨 Critical disks:

dm-69
dm-275

✅ Action Plan:

Map these disks → mount point
Identify DB objects
Run:

SELECT sql_id, executions, disk_reads
ORDER BY disk_reads DESC FETCH FIRST 10 ROWS ONLY;

✅ Final Takeaway

You now have:

✅ Disk → Mount → DB mapping
✅ Alert thresholds
✅ Real-time monitoring script
✅ DB wait correlation
✅ Troubleshooting workflow

ORACLE DATABASE PROBLEM AND SOLUTIONS

Tuesday, May 26, 2026

Map Actual storage disk mount point to troubleshoot the storage related performance issue

🔗 1. Map `dm-*` → Actual Mount Points (VERY IMPORTANT)

✅ Command:

✅ If using LVM:

✅ Detailed mapping:

✅ Correlate with filesystem:

🔗 2. Map Disk → Oracle / DB Files

✅ For Oracle:

✅ Check temp / redo:

📊 3. iostat → DB Wait Event Mapping

🚨 4. Alert Thresholds (Production Standard)

✅ Disk Health Thresholds

✅ Quick Alert Command

📈 5. Real-Time Monitoring Script (Reusable)

✅ Save as `disk_monitor.sh`

✅ Run:

🔍 6. Identify Top IO Consumers

✅ Process level:

✅ File level:

✅ Per process IO:

🧠 7. Advanced DBA Analysis Flow

✅ Follow this sequence:

⚡ 8. Quick Root Cause Patterns

🔴 Pattern 1 (Your case earlier)

🟢 Pattern 2

🔴 Pattern 3

🎯 9. What You Should Do Immediately (From Your Data)

🚨 Critical disks:

✅ Action Plan:

✅ Final Takeaway

No comments:

Post a Comment

How to verify whether TIMED_STATISTICS changed between the oracle AWR database snapshots ?

Tuesday, May 26, 2026

Map Actual storage disk mount point to troubleshoot the storage related performance issue

🔗 1. Map dm-* → Actual Mount Points (VERY IMPORTANT)

✅ Command:

✅ If using LVM:

✅ Detailed mapping:

✅ Correlate with filesystem:

🔗 2. Map Disk → Oracle / DB Files

✅ For Oracle:

✅ Check temp / redo:

📊 3. iostat → DB Wait Event Mapping

🚨 4. Alert Thresholds (Production Standard)

✅ Disk Health Thresholds

✅ Quick Alert Command

📈 5. Real-Time Monitoring Script (Reusable)

✅ Save as disk_monitor.sh

✅ Run:

🔍 6. Identify Top IO Consumers

✅ Process level:

✅ File level:

✅ Per process IO:

🧠 7. Advanced DBA Analysis Flow

✅ Follow this sequence:

⚡ 8. Quick Root Cause Patterns

🔴 Pattern 1 (Your case earlier)

🟢 Pattern 2

🔴 Pattern 3

🎯 9. What You Should Do Immediately (From Your Data)

🚨 Critical disks:

✅ Action Plan:

✅ Final Takeaway

No comments:

Post a Comment

How to verify whether TIMED_STATISTICS changed between the oracle AWR database snapshots ?

🔗 1. Map `dm-*` → Actual Mount Points (VERY IMPORTANT)

✅ Save as `disk_monitor.sh`