Tuesday, May 26, 2026

Storage Disk Performance Baseline Table to troubleshoot the performance issue

✅ Disk Performance Baseline Table (iostat -xm)

📊 1. Latency (Most Important)

Metric	Good ✅	Warning ⚠️	Critical 🚨	Notes
`await` (ms)	< 5	5 – 20	> 50	Total latency (queue + service)
`r_await`	< 5	5 – 20	> 50	Read latency
`w_await`	< 5	5 – 20	> 50	Write latency

📊 2. Disk Utilization

Metric	Good ✅	Warning ⚠️	Critical 🚨	Notes
`%util`	< 70%	70–90%	> 90%	High alone is OK if latency is low

📊 3. Queue Depth (Pressure Indicator)

Metric	Good ✅	Warning ⚠️	Critical 🚨	Notes
`avgqu-sz`	< 1	1 – 5	> 10	Queue waiting to be served

📊 4. Service Time vs Wait Time

Pattern	Interpretation
`await ≈ svctm`	✅ Healthy (no queueing)
`await >> svctm`	🚨 Queue bottleneck

📊 5. Throughput (rMB/s, wMB/s)

For modern systems (SSD / SAN / NVMe)

Metric	Good ✅	Warning ⚠️	Critical 🚨
Read throughput	< 70% of max capacity	70–90%	> 90% sustained
Write throughput	Same as above	Same	Same

👉 Absolute value depends on storage type:

HDD: ~100–200 MB/s
SSD: ~500 MB/s – 2 GB/s
NVMe: 2–5+ GB/s

📊 6. IOPS (r/s, w/s)

Workload	Typical Healthy Range
OLTP (random IO)	1K – 50K IOPS
DW / Analytics	Lower IOPS, higher throughput

👉 Key rule:

High IOPS + low latency = ✅ good
High IOPS + high latency = 🚨 bottleneck

📊 7. IO Size (avgrq-sz)

Value	Meaning	Health
< 32 KB	Random IO (OLTP)	✅
64–256 KB	Mixed	✅
~512 KB – 1 MB	Sequential scan	⚠️ if causing latency

🎯 ✅ Quick Decision Matrix

Condition	Verdict
High %util + low await (<5ms)	✅ Healthy
High %util + high await (>50ms)	🚨 Bottleneck
High queue (>10)	🚨 Overloaded
Low util + high await	⚠️ Storage issue
Large IO + high latency	⚠️ Scan / DW workload

📌 ✅ DBA-Focused Interpretation

Pattern	Root Cause
High rMB/s + large avgrq-sz	Full table scans
High r/s + small IO	Index access
High w_await	Log/write issue
High avgqu-sz	Storage saturation
High await everywhere	Storage slow

🔥 ✅ Golden Rules (Use in Production)

✅ Healthy Disk

%util < 80
await < 10 ms
avgqu-sz < 3

⚠️ Warning Zone

%util > 80
await 10–30 ms
avgqu-sz 3–10

🚨 Critical Disk Bottleneck

%util > 90
await > 50 ms
avgqu-sz > 10
await >> svctm

✅ ✅ Example Applied to Your Earlier Data

Disk	Verdict
dm-xx (await ~97 ms, util 100%)	🚨 Critical
dm-xxx (queue 40, await 72 ms)	🚨 Severe
dm-xxx (await 1.5 ms, util 99%)	✅ Healthy

Save as `disk_health_score.sh`

#!/bin/bash

echo "===== Disk Health Score ====="

date

echo ""

iostat -xm 2 3 | awk '

function score(util, await, queue) {

s = 100

# Util penalty

if (util > 90) s -= 25

else if (util > 70) s -= 10

# Await penalty

if (await > 50) s -= 50

else if (await > 20) s -= 30

else if (await > 5) s -= 15

# Queue penalty

if (queue > 10) s -= 40

else if (queue > 5) s -= 20

else if (queue > 1) s -= 10

if (s < 0) s = 0

return s

}

function status(s) {

if (s >= 80) return "HEALTHY"

else if (s >= 60) return "WARNING"

else if (s >= 40) return "DEGRADED"

else return "CRITICAL"

}

/Device/ {

printf "%-10s %-6s %-8s %-8s %-6s\n","Device","Util%","Await","Queue","Status"

}

$1 ~ /^(sd|dm)/ {

util = $NF

await = $(NF-3)

queue = $(NF-4)

s = score(util, await, queue)

st = status(s)

printf "%-10s %-6.1f %-8.1f %-8.1f %-6s\n",$1,util,await,queue,st

}

chmod +x disk_health_score.sh

./disk_health_score.sh

Sample Output

Device     Util%  Await    Queue    Status
dm-xx      100.0  97.2     24.3     CRITICAL
dm-xxx     99.9   72.4     40.5     CRITICAL
dm-xx      99.9   80.0     7.2      DEGRADED
dm-xxx     99.4   1.5      11.6     WARNING

ORACLE DATABASE PROBLEM AND SOLUTIONS

Tuesday, May 26, 2026

Storage Disk Performance Baseline Table to troubleshoot the performance issue

📊 1. Latency (Most Important)

📊 2. Disk Utilization

📊 3. Queue Depth (Pressure Indicator)

📊 4. Service Time vs Wait Time

📊 5. Throughput (rMB/s, wMB/s)

For modern systems (SSD / SAN / NVMe)

📊 6. IOPS (r/s, w/s)

📊 7. IO Size (avgrq-sz)

🎯 ✅ Quick Decision Matrix

📌 ✅ DBA-Focused Interpretation

🔥 ✅ Golden Rules (Use in Production)

✅ Healthy Disk

⚠️ Warning Zone

🚨 Critical Disk Bottleneck

✅ ✅ Example Applied to Your Earlier Data

Save as `disk_health_score.sh`

Sample Output

No comments:

Post a Comment

How to verify whether TIMED_STATISTICS changed between the oracle AWR database snapshots ?

Tuesday, May 26, 2026

Storage Disk Performance Baseline Table to troubleshoot the performance issue

📊 1. Latency (Most Important)

📊 2. Disk Utilization

📊 3. Queue Depth (Pressure Indicator)

📊 4. Service Time vs Wait Time

📊 5. Throughput (rMB/s, wMB/s)

For modern systems (SSD / SAN / NVMe)

📊 6. IOPS (r/s, w/s)

📊 7. IO Size (avgrq-sz)

🎯 ✅ Quick Decision Matrix

📌 ✅ DBA-Focused Interpretation

🔥 ✅ Golden Rules (Use in Production)

✅ Healthy Disk

⚠️ Warning Zone

🚨 Critical Disk Bottleneck

✅ ✅ Example Applied to Your Earlier Data

Save as disk_health_score.sh

Sample Output

No comments:

Post a Comment

How to verify whether TIMED_STATISTICS changed between the oracle AWR database snapshots ?

Save as `disk_health_score.sh`