Wednesday, May 27, 2026

Relationship between huge page and OS Page

1. Normal OS Page vs HugePage (Fundamentals)

Default Linux memory pages

Standard page size: 4 KB
Every memory access uses page tables to translate virtual → physical
Large SGA → millions of page table entries (PTEs)

HugePages (HugeTLB)

Larger page size: typically 2 MB (or 1 GB on some systems)
Reduces:
- Page table entries
- TLB (Translation Lookaside Buffer) misses
- CPU overhead

2. How Oracle Uses HugePages Internally

SGA allocation path

When HugePages are enabled (USE_LARGE_PAGES=TRUE|ONLY):

Oracle requests SGA memory during startup
Kernel checks:
- Are HugePages available (/proc/meminfo)?
If yes:
- Allocates SGA entirely from HugePages pool
- Pins memory → not swappable
If not:
- Falls back to normal 4KB pages (unless ONLY)

Key behavior

Only SGA uses HugePages
PGA, stack, processes → still use normal pages

3. Interaction with OS Page Cache & IO

This is where most confusion happens.

A. Buffered I/O (filesystem)

Uses OS page cache
Pages are normal 4KB pages

Reads:

Disk → OS Page Cache → Oracle buffer cache

✅ HugePages are NOT used in OS page cache

B. Direct I/O (O_DIRECT / ASM / Filesystem with DirectIO)

Bypasses OS page cache
Data flows:

Disk → Oracle SGA buffer cache (HugePages)

Important insight:

With HugePages:
- Buffer cache resides in HugePages
- Data blocks (8 KB, 16 KB) are packed inside HugePages

👉 HugePages improve:

Memory lookup efficiency
Buffer cache access speed

👉 But they do NOT change I/O block size

4. Relationship: HugePages vs DB Block vs OS Page

Layer	Typical Size
Oracle DB block	8 KB (default)
OS normal page	4 KB
HugePage	2 MB

Internal packing

A single HugePage (2 MB) holds:

2 MB / 8 KB = 256 Oracle blocks

So:

Oracle still uses 8 KB blocks
HugePages just back the memory region

5. Does HugePages Increase IO Performance?

Direct Impact:

❌ Does NOT increase disk IO throughput directly
✅ Reduces CPU overhead during memory access

Indirect Impact:

Faster buffer cache lookups
Reduced TLB misses
Better scalability for large SGA
Less kernel overhead → more CPU for DB work

👉 So overall DB performance improves, but IO bandwidth stays same

6. How HugePages Reduce Kernel Overhead

Without HugePages (example 500 GB SGA):

500 GB / 4 KB ≈ 131 million pages

With HugePages (2 MB):

500 GB / 2 MB ≈ 256,000 pages

✅ Reduction:

Page table size drastically reduced
Fewer TLB entries needed
Less CPU spent on memory translation

7. Key Kernel Components Involved

1. TLB (Translation Lookaside Buffer)

Cache of virtual→physical mappings
HugePages → fewer entries needed → less misses

2. Page Tables

Shrink significantly with HugePages

3. Buddy Allocator / HugeTLB pool

Pre-reserved memory (vm.nr_hugepages)
Prevents fragmentation issues

8. Why HugePages Do NOT Affect OS Page Cache

Because:

Page cache is managed by kernel VM subsystem
HugePages are:
- Preallocated
- Not swappable
- Not part of regular memory pool

👉 They are isolated memory region

9. How HugePages Size Is Increased

Step 1: Calculate requirement

HugePages = (SGA size) / HugePage size

Example:

SGA = 100 GB
HugePage = 2 MB

100 GB / 2 MB = 51200 pages

Step 2: Configure kernel

vm.nr_hugepages = 51200

Step 3: Reload

sysctl -p

Step 4: Verify

cat /proc/meminfo | grep Huge

Important fields:

HugePages_Total
HugePages_Free
HugePages_Rsvd
HugePages_Surp

10. Advanced Behavior (Very Important)

Partial allocation

If HugePages are insufficient:

Oracle may split:
- Part SGA → HugePages
- Rest → normal pages ❌ (bad)

👉 Causes:

Performance inconsistency
Memory fragmentation

✅ Best practice:

USE_LARGE_PAGES=ONLY

11. Summary (Core Concepts)

What HugePages DO

✅ Optimize SGA memory management
✅ Reduce CPU overhead
✅ Improve scalability
✅ Prevent swapping

What HugePages DO NOT DO

❌ Do not change DB block size
❌ Do not increase disk IO speed
❌ Do not affect OS page cache

12. Simple Mental Model

Think of it like this:

Regular pages → many small boxes (slow to manage)
HugePages → fewer large containers (efficient)

Oracle:

Stores data blocks inside containers
Still processes blocks same way

13. DBA Insight (Important for you as Architect)

Given your role:

HugePages matter most when:

SGA > 16–32 GB
High OLTP concurrency
CPU contention exists

They matter less when:

IO bound system (slow storage)
Small SGA

How to Identify and fix Index is Causing Performance Issue in oracle database ?

🔎 PART 1 — How to Identify Index is Causing Performance Issue

✅ Step 1: Find Slow SQL (Entry Point)

Start from Top SQL, not from the index.

SELECT sql_id, executions, elapsed_time, buffer_gets

FROM v$sql

ORDER BY elapsed_time DESC FETCH FIRST 10 ROWS ONLY;

👉 Pick the problematic SQL

✅ Step 2: Check Execution Plan

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('sql_id', NULL, 'ALLSTATS LAST'));

🔍 Look for:

Symptom	Meaning
FULL TABLE SCAN	Index missing or ignored ❌
INDEX RANGE SCAN with high cost	Index inefficient ❌
INDEX FULL SCAN	Poor selectivity ❌

✅ Step 3: Check Index Access Cost

In plan, observe:

Cost
Rows
Bytes
Buffer Gets

👉 If index scan:

Reads too many blocks → Index is not selective

✅ Step 4: Check Clustering Factor (Critical)

SELECT index_name, clustering_factor, num_rows
FROM dba_indexes
WHERE index_name = 'INDEX_NAME';

Interpretation:

CF ≈ number of rows → ❌ Random reads (BAD index)
CF ≈ number of blocks → ✅ Efficient index

👉 High CF → more I/O → slower query

✅ Step 5: Check Selectivity (Very Important)

SELECT num_distinct, num_rows
FROM dba_tab_col_statistics
WHERE table_name = 'TABLE_NAME'
AND column_name = 'COLUMN_NAME';

👉 Formula:

Selectivity = num_distinct / num_rows

Result:

Value	Meaning
High (close to 1)	✅ Good index
Low (<0.1)	❌ Bad index

👉 Example:

status = 'ACTIVE' → bad index
customer_id → good index

✅ Step 6: Check If Index is Used Properly

ALTER INDEX schema.index_name MONITORING USAGE;

Then:

SELECT * FROM v$object_usage WHERE index_name = 'INDEX_NAME';

👉 Result:

USED = NO → Index useless
USED = YES but slow → Index is problem

✅ Step 7: Check Logical Reads (Performance Impact)

FROM v$segment_statistics
WHERE object_name = 'INDEX_NAME'
AND statistic_name = 'logical reads';

👉 High logical reads = heavy I/O → possible issue

✅ Step 8: Check Index BLEVEL (Depth)

SELECT index_name, blevel
FROM dba_indexes
WHERE index_name = 'INDEX_NAME';

BLEVEL	Meaning
0–2	✅ Good
3–4	⚠️ Ok
>4	❌ Too deep

✅ Step 9: Validate Stats

EXEC DBMS_STATS.GATHER_INDEX_STATS('SCHEMA','INDEX_NAME');

👉 Bad stats = wrong execution plan

✅ Step 10: Compare With Full Table Scan

Force test:

SELECT /*+ FULL(table_name) */ ..

👉 If FULL scan is faster → Index is hurting performance

🚨 Conclusion (Index Causing Issue If):

High clustering factor
Low selectivity
High logical reads
Not used or incorrectly used
Execution plan shows inefficiency

✅ PART 2 — How to Check Index is GOOD or BAD

🎯 Step-by-Step Evaluation Framework

✅ Step 1: Usage Check

SELECT * FROM v$object_usage

WHERE index_name = 'INDEX_NAME';

Result	Decision
USED frequently	✅ Good
NOT USED	❌ Drop candidate

✅ Step 2: Query Alignment

👉 Check if index matches WHERE clause:

WHERE col1 = :1 AND col2 = :2

👉 Good index:

(col1, col2)

👉 Bad index:

(col3, col4)

✅ Step 3: Selectivity Check

High distinct values → ✅ Good
Low distinct values → ❌ Bad

✅ Step 4: Clustering Factor

Low CF → ✅ Good
High CF → ❌ Bad

✅ Step 5: Size vs Benefit

SELECT bytes/1024/1024 MB
FROM dba_segments
WHERE segment_name='INDEX_NAME';

👉 Large + rarely used → ❌ Bad

✅ Step 6: DML Overhead Check

👉 Too many indexes on high DML table:

Slows INSERT/UPDATE/DELETE

SELECT index_name
FROM dba_indexes
WHERE table_name = 'TABLE_NAME';

👉 More indexes ≠ better

✅ Step 7: Redundancy Check

👉 Duplicate indexes = waste

✅ Step 8: BLEVEL + Leaf Blocks

👉 Small, compact index = good
👉 Huge fragmented index = bad

✅ FINAL DECISION MATRIX

Condition	Good Index ✅	Bad Index ❌
Usage	Frequent	Not used
Execution Plan	Index scan	Full scan
Selectivity	High	Low
Clustering Factor	Low	High
BLEVEL	≤2	>4
Size	Optimized	Huge unused
DML Impact	Low	High

🔥 REAL-WORLD QUICK CHECK SCRIPT

i.index_name,
i.clustering_factor,
i.num_rows,
i.distinct_keys,
s.bytes/1024/1024 MB
FROM dba_indexes i
JOIN dba_segments s
ON i.index_name = s.segment_name
WHERE i.owner = 'SCHEMA_NAME'
ORDER BY s.bytes DESC;

💡 Architect-Level Insight (Important)

👉 Don’t judge index in isolation

Always evaluate:

SQL → Execution Plan → Index → Data Pattern

✅ Recommended Approach (Best Practice)

Identify top SQL (AWR/V$SQL)
Analyze execution plan
Validate index usage
Check clustering factor + selectivity
Decide: KEEP / REBUILD / DROP

INDEX HEALTH CLASSIFICATION SCRIPT

set lines 300 pages 500

col OWNER for a10

col INDEX_NAME for a50

col TABLE_NAME for a50

WITH idx_stats AS (

SELECT

i.owner,

i.index_name,

i.table_name,

i.blevel,

i.clustering_factor,

i.num_rows,

i.distinct_keys,

s.bytes/1024/1024 AS size_mb,

CASE

WHEN i.num_rows > 0

THEN ROUND(i.distinct_keys / i.num_rows, 4)

ELSE 0

END AS selectivity

FROM dba_indexes i

JOIN dba_segments s

ON i.owner = s.owner

AND i.index_name = s.segment_name

WHERE i.owner IN ('RITRS')

AND CLUSTERING_FACTOR >700

usage_stats AS (

SELECT

index_name,

CASE

WHEN used = 'YES' THEN 'USED'

ELSE 'NOT_USED'

END AS usage_status

FROM v$object_usage

)

SELECT

i.owner,

i.index_name,

i.table_name,

i.size_mb,

i.blevel,

i.clustering_factor,

i.num_rows,

i.distinct_keys,

i.selectivity,

NVL(u.usage_status, 'UNKNOWN') AS usage_status,

CASE

WHEN NVL(u.usage_status, 'NOT_USED') = 'NOT_USED'

THEN 'DROP_CANDIDATE'

WHEN i.selectivity < 0.05

THEN 'BAD_INDEX_LOW_SELECTIVITY'

WHEN i.clustering_factor > i.num_rows * 0.9

THEN 'BAD_INDEX_HIGH_CF'

WHEN i.blevel > 4

THEN 'REBUILD_REQUIRED'

WHEN i.size_mb > 500

AND NVL(u.usage_status, 'NOT_USED') <> 'USED'

THEN 'REVIEW_BIG_UNUSED_INDEX'

ELSE 'GOOD_INDEX'

END AS index_health_status

FROM idx_stats i

LEFT JOIN usage_stats u

ON i.index_name = u.index_name

ORDER BY i.size_mb DESC

Alert Fix : 1 index(es) with too many extents in oracle database - oracle performance issue

🔍 What it Means

Oracle stores segments (tables, indexes) in extents (contiguous blocks).
Over time, due to growth, rebuilds, and DML, an index may end up with:
- Hundreds or thousands of small extents
This is usually flagged when:

extents > threshold (e.g., 1000 or configurable limit)

⚠️ Why It Matters

Generally not critical in modern Oracle (ASSM + LMT) environments.
But can still cause:
- Slight performance overhead in segment management
- Longer backup/restore times
- Inefficient disk usage in older systems
- Red flag in audits / compliance reports

👉 In most modern systems, it's more of a hygiene issue than a real performance problem.

✅ Identify the Problem Index

Run:

SELECT owner, segment_name, segment_type, extents

FROM dba_segments

WHERE segment_type = 'INDEX'

AND extents > 500 -- adjust threshold

ORDER BY extents DESC;

Or specifically:

SELECT owner, index_name

FROM dba_indexes

WHERE index_name IN (

SELECT segment_name

FROM dba_segments

WHERE extents > 500

);

🛠️ Fix Options

✅ Option 1: Rebuild the Index (Recommended)

ALTER INDEX schema_name.index_name REBUILD;

Benefits:

Reduces number of extents
Compacts index
Eliminates fragmentation

👉 If large index:

ALTER INDEX schema_name.index_name REBUILD ONLINE;

✅ Option 2: Rebuild with Storage Clause

To control extent allocation:

ALTER INDEX schema_name.index_name REBUILD

STORAGE (INITIAL 100M NEXT 100M);

👉 Helps avoid many small extents in future.

✅ Option 3: Move to Different Tablespace

If fragmentation is due to tablespace issues:

ALTER INDEX schema_name.index_name

REBUILD TABLESPACE new_tablespace;

✅ Option 4: Coalesce (Less Effective)

ALTER INDEX schema_name.index_name COALESCE;

Merges adjacent leaf blocks
Does NOT reduce extents
Safer but less impactful

🧠 Root Cause Analysis

Common reasons:

Autoallocate tablespace creating many uneven extents
Frequent index growth/shrink cycles
Old dictionary-managed tablespaces (DMT)
Poor storage parameters
Partitioned index mismanagement

🎯 Best Practices (Going Forward)

✅ Use:

Locally Managed Tablespaces (LMT)
AUTOALLOCATE or UNIFORM extent size

✅ For large indexes:

Predefine proper extent sizes:

CREATE INDEX idx_name
ON table_name(column)
TABLESPACE ts_name
STORAGE (INITIAL 128M NEXT 128M);

✅ Monitor periodically:

SELECT segment_name, extents

FROM dba_segments

WHERE extents > 500;

🚨 When You Can Ignore It

You can safely ignore if:

Database uses ASSM + LMT
No performance issues observed
Extents are auto-managed efficiently

✅ Recommendation for You (Database Architect Perspective)

Given your role, I’d suggest:

✅ Validate index size vs extents ratio
✅ Rebuild only if:
- Extents are very high (>1000)
- Index is frequently accessed
✅ Automate check in health scripts
✅ Standardize storage clauses for large indexes

💬 Summary

Issue	Action
Too many extents	Rebuild index
Fragmentation	Rebuild or move
Minor issue	Ignore if no perf impact

Tuesday, May 26, 2026

Is CPU issue ? troubleshooting workflow for oracle database performance issue with automation

✅ ✅ 1. CPU Troubleshooting Framework (Like iostat for CPU)

Use:

vmstat 2 5

top

📊 ✅ 2. CPU Metrics Explained (vmstat / top)

us sy id wa st

Metric	Meaning
us	User CPU (app/DB queries)
sy	System CPU (kernel/syscalls)
id	Idle CPU
wa	IO wait
st	Steal (VM contention)

✅ ✅ 3. CPU Baseline Table (Production Standard)

📊 CPU Utilization

Metric	Good ✅	Warning ⚠️	Critical 🚨
idle (id)	> 40%	20–40%	< 20%
user (us)	< 50%	50–70%	> 70%
system (sy)	< 20%	20–30%	> 30%
iowait (wa)	< 5%	5–15%	> 20%
steal (st)	0	1–5	> 5

📊 Run Queue (Very Important)

r (run queue)

Metric	Good ✅	Warning ⚠️	Critical 🚨
r vs CPU cores	≤ cores	1.5x cores	2x+ cores

✅ ✅ 4. CPU Health Decision Matrix

Condition	Meaning
High us	CPU-heavy queries
High sy	kernel/system overhead
High wa	not CPU → disk issue
High r	CPU contention
High st	VM issue

✅ ✅ 5. CPU Scoring Model (Like Disk)

🎯 Formula

Score = 100 
- user penalty 
- system penalty 
- run queue penalty 
- iowait penalty

📊 Penalties

✅ User CPU

<50 → 0
50–70 → -15
70 → -30

✅ System CPU

<20 → 0
20–30 → -15
30 → -30

✅ Run Queue

≤ cores → 0
1–2x cores → -20
2x cores → -40

✅ IO Wait

<5 → 0
5–15 → -20
15 → -40

✅ Final Status

Score	Status
80–100	✅ Healthy
60–80	⚠️ Warning
40–60	🔶 Degraded
<40	🚨 Critical

✅ ✅ 6. CPU Monitoring Script (PRODUCTION READY)

📜 `cpu_health.sh`

#!/bin/bash

echo "===== CPU Health Check ====="
date

vmstat 2 3 | tail -1 | awk '
{
r=$1
us=$13
sy=$14
id=$15
wa=$16
st=$17

score=100

# user penalty
if (us > 70) score -= 30
else if (us > 50) score -= 15

# system penalty
if (sy > 30) score -= 30
else if (sy > 20) score -= 15

# iowait penalty
if (wa > 15) score -= 40
else if (wa > 5) score -= 20

# status
status="HEALTHY"
if (score < 40) status="CRITICAL"
else if (score < 60) status="DEGRADED"
else if (score < 80) status="WARNING"

printf "RunQ=%d User=%d%% Sys=%d%% Idle=%d%% IOwait=%d%% Status=%s Score=%d\n",r,us,sy,id,wa,status,score
}
'

Identify CPU Bottleneck (Root Cause)

✅ Step 1: Check top processes

top -o %CPU

✅ Step 2: Per process CPU

ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu | head

✅ Step 3: Thread-level (VERY IMPORTANT for DB)

top -H

✅ Step 4: Per process breakdown

pidstat -u 2

✅ ✅ 8. CPU Issue Patterns (DBA Mapping)

🔴 Pattern 1: High USER CPU

us > 70%

👉 Cause:

Complex SQL
Full scans
bad execution plans

✅ DB Mapping:

SELECT sql_id, cpu_time

FROM v$sql

ORDER BY cpu_time DESC FETCH FIRST 10 ROWS;

🔴 Pattern 2: High SYSTEM CPU

sy > 30%

👉 Cause:

excessive context switching
kernel overhead
I/O interrupts

✅ Check:

vmstat 1

Look at:

cs (context switches)
in (interrupts)

🔴 Pattern 3: High RUN QUEUE

r >> CPU cores

👉 Cause:

CPU contention
too many concurrent queries

🔴 Pattern 4: High IO WAIT

wa > 20%

👉 Cause:

NOT CPU problem
👉 disk bottleneck (go back to iostat)

🔴 Pattern 5: High STEAL (VM)

st > 5%

👉 Cause:

host contention (cloud/VM issue)

✅ ✅ 9. CPU Alert Script (Automation)

📜 `cpu_alert.sh`

#!/bin/bash

HOST=$(hostname)

ALERT=$(vmstat 2 2 | tail -1 | awk '
{
us=$13; sy=$14; wa=$16; st=$17

if (us>70 || sy>30 || wa>15 || st>5)
print "CPU Issue: user="us"% sys="sy"% wa="wa"% st="st"%"
}
')

if [ ! -z "$ALERT" ]; then
echo "$ALERT" | mail -s "CPU Alert on $HOST" your_email@company.com
fi
``

✅ ✅ 10. DBA Troubleshooting Flow

🔥 When CPU alert triggers:

1. Check:

vmstat 1 5

2. Identify pattern:

High us → SQL issue
High sy → OS/kernel
High wa → disk issue

3. Identify process:

top

4. Map to DB:

SELECT sql_id, cpu_time FROM v$sql ORDER BY cpu_time DESC;

5. Fix:

Tune query ✅
Index ✅
Reduce parallelism ✅
Limit sessions ✅

✅ ✅ 11. Golden Rules (CPU)

✅ Healthy

idle > 40%
wa < 5%
r <= cores

🚨 Critical CPU

idle < 10%
us > 70%
r > 2x cores

⚠️ Trick Case (VERY IMPORTANT)

CPU looks high BUT wa is high
👉 NOT CPU issue → DISK issue

🎯 Final Outcome

Now you have:

✅ CPU baseline table
✅ Health scoring model
✅ Monitoring script
✅ Alerting system
✅ DB correlation
✅ Troubleshooting flow

Is storage issue ? troubleshooting workflow for oracle database performance issue with automation

✅ 1. AUTO EMAIL ALERT SCRIPT (Real Production Use)

This script:

Detects disk bottlenecks
Sends email alert automatically

📜 Script: `disk_alert_mail.sh`

#!/bin/bash

HOST=$(hostname)
DATE=$(date)

ALERT=$(iostat -xm 2 2 | awk '
$1 ~ /^(sd|dm)/ {
util=$NF
await=$(NF-3)
queue=$(NF-4)

if (util>95 || await>50 || queue>10) {
printf "%-8s util=%s%% await=%sms queue=%s\n", $1, util, await, queue
}
}
')

if [ ! -z "$ALERT" ]; then
echo "Disk Alert on $HOST at $DATE

Critical disks detected:

$ALERT
" | mail -s "🚨 Disk I/O Alert on $HOST" your_email@company.com
fi

✅ Setup Cron (every 5 min)

crontab -e

Add:

*/5 * * * * /path/disk_alert_mail.sh

✅ ✅ 2. AUTO MAP dm-* → MOUNT → DB FILE (Game-Changer)

This solves your biggest pain: 👉 “Which DB object is causing this?”

📜 Script: `disk_to_db_map.sh`

#!/bin/bash

echo "==== Disk → Mount → Files ===="

for dev in $(lsblk -dn -o NAME | grep dm-); do
echo ""
echo "Device: $dev"

mount=$(lsblk -n -o MOUNTPOINT /dev/$dev 2>/dev/null)

if [ ! -z "$mount" ]; then
echo "Mount: $mount"
echo "Files:"
lsof +D $mount 2>/dev/null | head -10
else
echo "No mount (maybe ASM/LVM raw)"
fi
done

✅ Output Example

Device: dm-xx
Mount: /u02
Files:
oracle  db  datafile1.dbf
oracle  db  users01.dbf

✅ ✅ 3. CORRELATE iostat + ORACLE (Find exact SQL)

✅ Step 1: Identify heavy disk (from iostat)

Example:

dm-69 → /u02 → USERS tablespace

✅ Step 2: Find SQL causing reads

SELECT sql_id,
executions,
disk_reads,
buffer_gets,
ROUND(disk_reads/executions) avg_reads
ORDER BY disk_reads DESC
FETCH FIRST 10 ROWS ONLY;

✅ Step 3: Active session (real-time)

SELECT sid, serial#, sql_id, event
FROM v$session
WHERE wait_class='User I/O';

✅ Step 4: Wait event mapping

Wait Event	Meaning
`db file scattered read`	Full table scan 🚨
`db file sequential read`	Index lookup
`direct path read`	Large scan
`log file sync`	Commit slow

✅ ✅ 4. ADVANCED ALL-IN-ONE SCRIPT (MOST POWERFUL)

👉 Combines:

Disk health
Alert detection
Mapping

📜 `ultimate_disk_analyzer.sh`

#!/bin/bash

echo "===== Ultimate Disk Analyzer ====="
date
echo ""

iostat -xm 2 2 | awk '
/Device/ {
printf "%-8s %-6s %-6s %-6s %-10s\n","Dev","Util","Await","Queue","Status"
next
}

$1 ~ /^(sd|dm)/ {
util=$NF
await=$(NF-3)
queue=$(NF-4)

status="OK"

if (util>95 || await>50 || queue>10)
status="CRITICAL"
else if (util>80 || await>20 || queue>5)
status="WARNING"

printf "%-8s %-6.1f %-6.1f %-6.1f %-10s\n",$1,util,await,queue,status
}'

✅ ✅ 5. Real DBA Troubleshooting Flow (What YOU should do)

🔥 Step-by-step when alert triggers:

1. Run:

iostat -xm 2 5

2. Identify:

High %util
High await
High queue

3. Map:

lsblk

df -h

4. Check DB:

SELECT * FROM v$session WHERE event LIKE '%read%';

5. Fix:

Query tuning ✅
Indexing ✅
Move data ✅
Balance disks ✅

✅ ✅ 6. PRO TIPS (REAL WORLD)

🔴 When to PANIC

await > 100 ms
queue > 20
util = 100% sustained

🟢 When it's OK

util = 100% BUT await < 5
👉 (fast SSD under load)

🧠 Golden formula:

Problem = High Util + High Await + High Queue

🎯 Final Outcome

You now have:

✅ Auto alert system
✅ Disk → DB mapping
✅ SQL root cause detection
✅ Health scoring
✅ Full troubleshooting workflow