Tuesday, May 26, 2026

iostat linux command deep drive to troubleshooting the performance issue

  iostat -xm 2 5 | awk '$1 ~ /^(sd|dm)/ && $NF > 40 {printf "%-10s %s\n",$1,$NF"%"}'

 iostat -xm 2 5 | awk '$NF > 40 {print}'

 iostat -xm 2 5 | awk '/Device/ {print; next}$1 ~ /^(sd|dm)/ && $NF > 90 {print}'


📌 Header Breakdown (Deep Explanation)

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

✅ 1. Device

  • Logical or physical disk name
    • sdX → physical disks
    • dm-X → device mapper (LVM, ASM, multipath)

👉 In your case:

  • dm-* = logical volumes / DB storage layers

✅ 2. rrqm/s (Read Requests Merged per second)

  • Number of read requests merged by OS scheduler

Why merging matters:

  • OS combines adjacent reads to reduce I/O calls

👉 Example:

  • 10 small reads → merged → 1 large read

Interpretation:

  • High value → efficient sequential I/O
  • Zero → either random I/O or already optimized

✅ 3. wrqm/s (Write Requests Merged per second)

  • Same as above but for writes

✅ High value:

  • Good for sequential writes (e.g., redo logs, batch loads)

✅ 4. r/s (Reads per second)

  • Number of read I/O operations per second

Interpretation:

  • High r/s = high IOPS (random access likely)

✅ 5. w/s (Writes per second)

  • Number of write operations per second

👉 Together with r/s:

  • Indicates workload type:
    • OLTP → high r/s + w/s, small IO
    • Analytics → lower r/s but large I/O size

✅ 6. rMB/s (Read throughput in MB/sec)

  • Total data read per second

✅ 7. wMB/s (Write throughput in MB/sec)

  • Total data written per second

🔎 Important:

PatternMeaning
High r/s + low rMB/ssmall random IO
Low r/s + high rMB/slarge sequential IO

✅ 8. avgrq-sz (Average Request Size)

  • Average size of each I/O request (in KB)

Formula:

avgrq-sz = (total sectors read+written) / total I/O ops

Interpretation:

ValueMeaning
< 32 KBrandom IO (OLTP)
64–256 KBmixed
~1024 KB (1MB)sequential scan

✅ 9. avgqu-sz (Average Queue Length)

  • Number of I/O requests waiting in queue

🚨 Critical metric:

ValueImpact
< 1healthy
1–5moderate
10+pressure
20+severe bottleneck

👉 High value means:

  • Disk is overloaded
  • Requests are waiting → latency increase

✅ 10. await (Average Wait Time in ms)

  • Total time for I/O request:
    wait time = queue time + service time
    

🚨 Thresholds:

ValueMeaning
< 5 msexcellent
5–20 msacceptable
20–50 mswarning
> 50 msserious issue

👉 This is the most important latency metric


✅ 11. r_await (Read latency)

  • Avg time for read requests

✅ 12. w_await (Write latency)

  • Avg time for write requests

Why split matters:

  • Helps identify:
    • read-heavy issues (full scan)
    • write bottlenecks (redo/log/file sync)

✅ 13. svctm (Service Time)

  • Time taken by disk to service request
  • Does NOT include queue time

Important:

await ≈ svctm + queue delay

Interpretation:

CaseMeaning
await ≈ svctmno queue bottleneck
await >> svctmqueue contention

👉 This is key for bottleneck detection


✅ 14. %util (Utilization)

  • Percentage of time disk was busy

🚨 Interpretation:

ValueMeaning
< 60%safe
60–80%moderate
80–90%high
> 90%saturated

👉 BUT:

  • Must combine with await + queue

🔥 Important Combined Interpretation

✅ Case 1 (Healthy high usage)

%util = 95%
await = 1 ms
avgqu-sz = 1

✔ Efficient disk


🚨 Case 2 (Bottleneck)

%util = 99%
await = 80 ms
avgqu-sz = 20

❌ Disk saturation + queue buildup


🧠 How You Should Read Header (DBA Cheat Sheet)

Step-by-step analysis:

  1. Check %util

    • 90 → possible saturation

  2. Check avgqu-sz

    • High → queue backlog
  3. Check await

    • Confirms latency impact
  4. Compare await vs svctm

    • Big gap → queue delay
  5. Check avgrq-sz

    • Understand workload type

🎯 Why This Matters for You (Database Architect)

This header directly helps identify:

✅ DB Issues Mapping

MetricDB Problem
High rMB/s + large avgrq-szfull table scan
High r/s, low sizeindex lookup
High w_awaitcommit / redo issues
High avgqu-szstorage contention
High awaitslow queries

✅ Final Summary

  • r/s, w/s → IOPS
  • rMB/s, wMB/s → throughput
  • avgrq-sz → IO size (random vs sequential)
  • avgqu-sz → pressure indicator 🚨
  • await → real latency 🚨
  • %util → saturation signal

No comments:

Post a Comment

Is CPU issue ? troubleshooting workflow for oracle database performance issue with automation

✅ ✅ 1. CPU Troubleshooting Framework (Like iostat for CPU) Use: vmstat 2 5 or top 📊 ✅ 2. CPU Metrics Explained (vmstat / top) us sy id wa s...