Tuesday, May 26, 2026

Map Actual storage disk mount point to troubleshoot the storage related performance issue

 

🔗 1. Map dm-* → Actual Mount Points (VERY IMPORTANT)

✅ Command:

lsblk -o NAME,KNAME,MOUNTPOINT,SIZE,FSTYPE | grep dm-

✅ If using LVM:

dmsetup ls --tree

✅ Detailed mapping:

lsblk -o NAME,KNAME,PKNAME,MOUNTPOINT | column -t

✅ Correlate with filesystem:

df -h | grep /dev/mapper

👉 Why this matters

  • dm-62 → logical volume → mount point → DB datafile location
  • Helps answer:

    “Which tablespace is causing this spike?”


🔗 2. Map Disk → Oracle / DB Files

✅ For Oracle:

SELECT file_name, tablespace_name
FROM dba_data_files
WHERE file_name LIKE '%<mount_point>%';

✅ Check temp / redo:

SELECT name FROM v$tempfile;
SELECT member FROM v$logfile;

👉 Now you can map:

dm-69 → /u02 → USERS tablespace → full scan

📊 3. iostat → DB Wait Event Mapping

iostat PatternDB Wait EventMeaning
High rMB/s + large avgrq-szdb file scattered readFull table scan
High r/s small IOdb file sequential readIndex lookup
High w_awaitlog file syncCommit latency
High avgqu-szfree buffer waitsDB buffer pressure
High awaitAny IO waitStorage slow

🚨 4. Alert Thresholds (Production Standard)

✅ Disk Health Thresholds

MetricWarningCritical
%util>80%>90%
await>20 ms>50 ms
avgqu-sz>5>10
svctm vs await gapnoticeablelarge gap

✅ Quick Alert Command

iostat -xm 2 5 | awk '
/Device/ {print; next}
$1 ~ /^(sd|dm)/ && ($NF > 90 || $10 > 50 || $9 > 10) {print}
'

👉 Triggers when:

  • %util > 90
  • await > 50 ms
  • queue > 10

📈 5. Real-Time Monitoring Script (Reusable)

✅ Save as disk_monitor.sh

#!/bin/bash

echo "==== Disk Bottleneck Check ===="
date

iostat -xm 2 3 | awk '
/Device/ {print; next}
$1 ~ /^(sd|dm)/ && ($NF > 90 || $10 > 50 || $9 > 10) {
printf "ALERT: %-8s util=%s%% await=%sms queue=%s\n", $1, $NF, $10, $9
}'


✅ Run:


chmod +x disk_monitor.sh
./disk_monitor.sh


🔍 6. Identify Top IO Consumers

✅ Process level:

iotop -oP

✅ File level:

lsof | grep <device>

✅ Per process IO:

pidstat -d 2

🧠 7. Advanced DBA Analysis Flow

When you see:

%util = 100
await = high
avgqu-sz = high

✅ Follow this sequence:

  1. Identify disk
  2. Map to mount point
  3. Map to DB file
  4. Identify SQL causing IO
  5. Check execution plan

⚡ 8. Quick Root Cause Patterns

🔴 Pattern 1 (Your case earlier)

avgrq-sz ~ 1024 KB
await ~ 80–100 ms

👉 Cause:

  • Full table scans
  • Data warehouse queries

🟢 Pattern 2

r/s high + avgrq-sz small + low await

👉 Cause:

  • OLTP workload (healthy)

🔴 Pattern 3

w_await high

👉 Cause:

  • Commit issues
  • Log sync bottleneck

🎯 9. What You Should Do Immediately (From Your Data)

Based on your earlier output:

🚨 Critical disks:

  • dm-69
  • dm-275

✅ Action Plan:

  1. Map these disks → mount point
  2. Identify DB objects
  3. Run:
SELECT sql_id, executions, disk_reads
FROM v$sql
ORDER BY disk_reads DESC FETCH FIRST 10 ROWS ONLY;


✅ Final Takeaway

You now have:

✅ Disk → Mount → DB mapping
✅ Alert thresholds
✅ Real-time monitoring script
✅ DB wait correlation
✅ Troubleshooting workflow

iostat linux command deep drive to troubleshooting the performance issue

  iostat -xm 2 5 | awk '$1 ~ /^(sd|dm)/ && $NF > 40 {printf "%-10s %s\n",$1,$NF"%"}'

 iostat -xm 2 5 | awk '$NF > 40 {print}'

 iostat -xm 2 5 | awk '/Device/ {print; next}$1 ~ /^(sd|dm)/ && $NF > 90 {print}'


📌 Header Breakdown (Deep Explanation)

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

✅ 1. Device

  • Logical or physical disk name
    • sdX → physical disks
    • dm-X → device mapper (LVM, ASM, multipath)

👉 In your case:

  • dm-* = logical volumes / DB storage layers

✅ 2. rrqm/s (Read Requests Merged per second)

  • Number of read requests merged by OS scheduler

Why merging matters:

  • OS combines adjacent reads to reduce I/O calls

👉 Example:

  • 10 small reads → merged → 1 large read

Interpretation:

  • High value → efficient sequential I/O
  • Zero → either random I/O or already optimized

✅ 3. wrqm/s (Write Requests Merged per second)

  • Same as above but for writes

✅ High value:

  • Good for sequential writes (e.g., redo logs, batch loads)

✅ 4. r/s (Reads per second)

  • Number of read I/O operations per second

Interpretation:

  • High r/s = high IOPS (random access likely)

✅ 5. w/s (Writes per second)

  • Number of write operations per second

👉 Together with r/s:

  • Indicates workload type:
    • OLTP → high r/s + w/s, small IO
    • Analytics → lower r/s but large I/O size

✅ 6. rMB/s (Read throughput in MB/sec)

  • Total data read per second

✅ 7. wMB/s (Write throughput in MB/sec)

  • Total data written per second

🔎 Important:

PatternMeaning
High r/s + low rMB/ssmall random IO
Low r/s + high rMB/slarge sequential IO

✅ 8. avgrq-sz (Average Request Size)

  • Average size of each I/O request (in KB)

Formula:

avgrq-sz = (total sectors read+written) / total I/O ops

Interpretation:

ValueMeaning
< 32 KBrandom IO (OLTP)
64–256 KBmixed
~1024 KB (1MB)sequential scan

✅ 9. avgqu-sz (Average Queue Length)

  • Number of I/O requests waiting in queue

🚨 Critical metric:

ValueImpact
< 1healthy
1–5moderate
10+pressure
20+severe bottleneck

👉 High value means:

  • Disk is overloaded
  • Requests are waiting → latency increase

✅ 10. await (Average Wait Time in ms)

  • Total time for I/O request:
    wait time = queue time + service time
    

🚨 Thresholds:

ValueMeaning
< 5 msexcellent
5–20 msacceptable
20–50 mswarning
> 50 msserious issue

👉 This is the most important latency metric


✅ 11. r_await (Read latency)

  • Avg time for read requests

✅ 12. w_await (Write latency)

  • Avg time for write requests

Why split matters:

  • Helps identify:
    • read-heavy issues (full scan)
    • write bottlenecks (redo/log/file sync)

✅ 13. svctm (Service Time)

  • Time taken by disk to service request
  • Does NOT include queue time

Important:

await ≈ svctm + queue delay

Interpretation:

CaseMeaning
await ≈ svctmno queue bottleneck
await >> svctmqueue contention

👉 This is key for bottleneck detection


✅ 14. %util (Utilization)

  • Percentage of time disk was busy

🚨 Interpretation:

ValueMeaning
< 60%safe
60–80%moderate
80–90%high
> 90%saturated

👉 BUT:

  • Must combine with await + queue

🔥 Important Combined Interpretation

✅ Case 1 (Healthy high usage)

%util = 95%
await = 1 ms
avgqu-sz = 1

✔ Efficient disk


🚨 Case 2 (Bottleneck)

%util = 99%
await = 80 ms
avgqu-sz = 20

❌ Disk saturation + queue buildup


🧠 How You Should Read Header (DBA Cheat Sheet)

Step-by-step analysis:

  1. Check %util

    • 90 → possible saturation

  2. Check avgqu-sz

    • High → queue backlog
  3. Check await

    • Confirms latency impact
  4. Compare await vs svctm

    • Big gap → queue delay
  5. Check avgrq-sz

    • Understand workload type

🎯 Why This Matters for You (Database Architect)

This header directly helps identify:

✅ DB Issues Mapping

MetricDB Problem
High rMB/s + large avgrq-szfull table scan
High r/s, low sizeindex lookup
High w_awaitcommit / redo issues
High avgqu-szstorage contention
High awaitslow queries

✅ Final Summary

  • r/s, w/s → IOPS
  • rMB/s, wMB/s → throughput
  • avgrq-sz → IO size (random vs sequential)
  • avgqu-sz → pressure indicator 🚨
  • await → real latency 🚨
  • %util → saturation signal

Monday, May 25, 2026

Step-by-Step HugePages Configuration (Oracle 19c on Linux)

 

 Step-by-Step HugePages Configuration (Oracle 19c on Linux)


🔹 Step 1: Check Current HugePages Status

grep Huge /proc/meminfo

Key parameters:

  • HugePages_Total
  • HugePages_Free
  • Hugepagesize (usually 2 MB)

🔹 Step 2: Disable Transparent HugePages (THP)

Oracle recommends disabling THP.

Check status:

cat /sys/kernel/mm/transparent_hugepage/enabled

Disable temporarily:

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

Disable permanently:

Edit:

vi /etc/default/grub

Add:

transparent_hugepage=never

Then apply:

grub2-mkconfig -o /boot/grub2/grub.cfg
reboot

🔹 Step 3: Calculate Required HugePages

3.1 Check Oracle SGA size

From SQL:

show parameter sga_target;

or:

show parameter memory_target;


3.2 Use Oracle Script (Recommended)

$ORACLE_HOME/bin/hugepages_settings.sh

If not available, download from Oracle MOS.


3.3 Manual Calculation

Formula:

HugePages = Total_SGA_Size / HugePage_Size

Example:

  • SGA = 64 GB
  • Page size = 2 MB
64 * 1024 MB / 2 MB = 32768 HugePages

🔹 Step 4: Configure HugePages in Kernel

Edit:

vi /etc/sysctl.conf

Add/update:

vm.nr_hugepages=32768

Apply:

sysctl -p


🔹 Step 5: Configure memlock for Oracle User

Edit:

vi /etc/security/limits.conf

Add:

oracle soft memlock 67108864
oracle hard memlock 67108864

👉 Value should match SGA size (in KB)

Example:

64 GB = 67108864 KB

🔹 Step 6: Disable AMM (Very Important)

HugePages does NOT work with AMM.

Check:

show parameter memory_target;

If > 0, disable:

ALTER SYSTEM SET memory_target=0 SCOPE=SPFILE;
ALTER SYSTEM SET memory_max_target=0 SCOPE=SPFILE;

Instead, use ASMM:

ALTER SYSTEM SET sga_target=64G SCOPE=SPFILE;
ALTER SYSTEM SET pga_aggregate_target=8G SCOPE=SPFILE;

Restart DB:

shutdown immediate;
startup;

🔹 Step 7: Restart Server

reboot

🔹 Step 8: Validate HugePages Usage

grep Huge /proc/meminfo

Check:

  • HugePages_Total → should match configured value
  • HugePages_Free → should decrease after DB start
  • HugePages_Rsvd → reserved pages

🔹 Step 9: Verify Oracle is Using HugePages

grep -i huge /proc/meminfo

Also check alert log:

grep HugePages $ORACLE_BASE/diag/rdbms/*/*/trace/alert*.log

Expected message:

Huge Pages allocation successful

✅ Best Practices

✔ Always leave some RAM for OS (not 100% HugePages)
✔ Use static SGA (ASMM)
✔ Set HugePages slightly higher than requirement
✔ Monitor with:

vmstat
free -g

⚠️ Common Mistakes

❌ Using AMM (memory_target)
❌ Not setting memlock
❌ THP not disabled
❌ Underestimating number of HugePages


🚀 Quick Example Summary

For a 64 GB SGA:

SettingValue
HugePage size2 MB
Required pages32768
memlock67108864 KB
vm.nr_hugepages32768




Oracle Script for huge_page setting 

vi hugepages_settings.sh 


#!/bin/bash 
# hugepages_settings.sh 
# Linux bash script to compute values for the 
# recommended HugePages/HugeTLB configuration 
# on Oracle Linux 
# Note: This script does calculation for all shared memory 
# segments available when the script is run, no matter it 
# is an Oracle RDBMS shared memory segment or not. 
# This script is provided by KB151310 from My Oracle Support 
# http://support.oracle.com 

# Welcome text 
echo " 
This script is provided by KB151310 from My Oracle Support 
(http://support.oracle.com) where it is intended to compute values for 
the recommended HugePages/HugeTLB configuration for the current shared 
memory segments on Oracle Linux. Before proceeding with the execution please note following: 
 * For ASM instance, it needs to configure ASMM instead of AMM. 
 * The 'pga_aggregate_target' is outside the SGA and 
   you should accommodate this while calculating the overall size. 
 * In case you changes the DB SGA size, 
   as the new SGA will not fit in the previous HugePages configuration, 
   it had better disable the whole HugePages, 
   start the DB with new SGA size and run the script again. 
And make sure that: 
 * Oracle Database instance(s) are up and running 
 * Oracle Database Automatic Memory Management (AMM) is not setup 
   (See KB83222) 
 * The shared memory segments can be listed by command: 
     # ipcs -m 


Press Enter to proceed..." 

read 

# Check for the kernel version 
KERN=`uname -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'` 

# Find out the HugePage size 
HPG_SZ=`grep Hugepagesize /proc/meminfo | awk '{print $2}'` 
if [ -z "$HPG_SZ" ];then 
    echo "The hugepages may not be supported in the system where the script is being executed." 
    exit 1 
fi 

# Initialize the counter 
NUM_PG=0 

# Cumulative number of pages required to handle the running shared memory segments 
for SEG_BYTES in `ipcs -m | cut -c44-300 | awk '{print $1}' | grep "[0-9][0-9]*"` 
do 
    MIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q` 
    if [ $MIN_PG -gt 0 ]; then 
        NUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q` 
    fi 
done 

RES_BYTES=`echo "$NUM_PG * $HPG_SZ * 1024" | bc -q` 

# An SGA less than 100MB does not make sense 
# Bail out if that is the case 
if [ $RES_BYTES -lt 100000000 ]; then 
    echo "***********" 
    echo "** ERROR **" 
    echo "***********" 
    echo "Sorry! There are not enough total of shared memory segments allocated for 
HugePages configuration. HugePages can only be used for shared memory segments 
that you can list by command: 

    # ipcs -m 

of a size that can match an Oracle Database SGA. Please make sure that: 
 * Oracle Database instance is up and running 
 * Oracle Database Automatic Memory Management (AMM) is not configured" 
    exit 1 
fi 

# Finish with results 
    echo "Recommended setting: vm.nr_hugepages = $NUM_PG"; 

# End



Monday, May 18, 2026

step‑by‑step guide to check CPU sockets, cores, threads, and multithreading (Hyper‑Threading/SMT)

Step‑by‑step guide to check CPU sockets, cores, threads, and multithreading (Hyper‑Threading/SMT)

✅ 1. BASIC UNDERSTANDING (IMPORTANT)

TermMeaning
SocketPhysical CPU installed on motherboard
CorePhysical processing unit inside CPU
ThreadLogical CPU (via Hyper‑Threading / SMT)
Multithreading Enabled?If Threads > Cores

👉 Example:

  • 1 socket, 4 cores, 8 threads → Hyper‑Threading ENABLED
  • 1 socket, 4 cores, 4 threads → Hyper‑Threading DISABLED

🐧 2. LINUX – STEP BY STEP

🔹 Step 1: Check CPU details

Run:

lscpu

Example output:

CPU(s):              8
Socket(s):           1
Core(s) per socket:  4
Thread(s) per core:  2

Interpret:

  • Sockets = 1
  • Cores = 4
  • Threads = 8
  • Thread per core = 2 → Multithreading ENABLED ✅

🔹 Step 2: Check using /proc

cat /proc/cpuinfo | grep "physical id" | sort -u | wc -l

👉 Gives number of sockets

cat /proc/cpuinfo | grep "cpu cores" | uniq

👉 Shows cores per socket

cat /proc/cpuinfo | grep "processor" | wc -l

👉 Total logical CPUs (threads)


🔹 Step 3: Quick single command summary

nproc

👉 gives total threads


🔹 Step 4: Check Hyper‑Threading explicitly

lscpu | grep "Thread"

If:

Thread(s) per core: 2

👉 ✅ Hyper‑Threading ON

If:

Thread(s) per core: 1

👉 ❌ OFF


🪟 3. WINDOWS – STEP BY STEP

🔹 Step 1: Task Manager (GUI method)

  1. Press Ctrl + Shift + Esc
  2. Go to Performance tab → CPU
  3. Look at:
Sockets: X
Cores: Y
Logical processors: Z

👉 Example:

  • Cores = 4
  • Logical processors = 8
    ✅ Multithreading ENABLED

🔹 Step 2: Command Prompt

Run:

wmic cpu get NumberOfCores,NumberOfLogicalProcessors

Output:

NumberOfCores  NumberOfLogicalProcessors
4              8

👉 Threads (LogicalProcessors) > Cores
✅ Hyper‑Threading ON


🔹 Step 3: PowerShell (recommended)

Get-WmiObject Win32_Processor | Select NumberOfCores, NumberOfLogicalProcessors, SocketDesignation

OR modern cmdlet:

Get-CimInstance Win32_Processor | Select NumberOfCores,NumberOfLogicalProcessors



🔹 Step 4: System Information

Run:

msinfo32

Check:

  • Processor
  • Logical processors

🔍 4. HOW TO TELL IF MULTITHREADING IS ENABLED

✅ Rule:

If Threads > Cores → ENABLED
If Threads = Cores → DISABLED

Example:

CoresThreadsStatus
48✅ Enabled
816✅ Enabled
44❌ Disabled

🔧 5. BIOS LEVEL CHECK (Important)

Even if OS shows cores/threads, final control is in BIOS.

Steps:

  1. Reboot system
  2. Enter BIOS (F2 / DEL / ESC)
  3. Look for:
    • Intel Hyper‑Threading
    • AMD SMT (Simultaneous Multithreading)

Settings:

  • Enabled ✅ → uses threads
  • Disabled ❌ → only physical cores

✅ 6. QUICK CHEAT SHEET

Linux

lscpu

Windows

wmic cpu get NumberOfCores,NumberOfLogicalProcessors


🚀 FINAL SUMMARY

PlatformCommandWhat to Check
LinuxlscpuThreads per core
Linux/proc/cpuinfocores, sockets
WindowsTask ManagerLogical processors
Windowswmiccores vs threads
    



SQL> SELECT stat_name, value
FROM   v$osstat
WHERE  stat_name IN ('NUM_CPUS','NUM_CPU_CORES','NUM_CPU_SOCKETS');
``  2    3
STAT_NAME                           VALUE
------------------------------ ----------
NUM_CPUS                                8
NUM_CPU_CORES                           8
NUM_CPU_SOCKETS                         2

SQL>


Interpretation:

NUM_CPUS = logical CPUs visible to the OS (includes HT threads)
NUM_CPU_CORES = physical cores
If NUM_CPUS > NUM_CPU_CORES → SMT/HT enabled
If NUM_CPUS = NUM_CPU_CORES → SMT/HT disabled



SQL> !cat /proc/cpuinfo | egrep "processor|cpu cores|siblings|physical id" 

physical id     : 0
siblings        : 4
cpu cores       : 4
processor       : 1

cpu cores = physical cores per socket
siblings = logical CPUs per socket

If siblings > cpu cores → SMT/HT enabled
If siblings == cpu cores → SMT/HT disabled




Is CPU issue ? troubleshooting workflow for oracle database performance issue with automation

✅ ✅ 1. CPU Troubleshooting Framework (Like iostat for CPU) Use: vmstat 2 5 or top 📊 ✅ 2. CPU Metrics Explained (vmstat / top) us sy id wa s...