✅ Oracle Database Production Hardening Checklist (RHEL)
1️⃣ OS & Kernel Hardening
🔒 OS Configuration
- Dedicated server / VM for Oracle only
- Correct RHEL version certified for Oracle
- Minimal packages installed (no GUI, dev tools)
- NTP / chrony configured and synced
⚙ Kernel Parameters
Verify:
sysctl -a | egrep "shm|sem|fs.aio|max_map_count"
Key settings:
-
kernel.shmmni,shmmax,shmall -
kernel.semsized for workload -
fs.aio-max-nrsufficient -
vm.swappiness = 1 -
vm.zone_reclaim_mode = 0
✅ Persist in /etc/sysctl.conf
❌ Transparent Huge Pages (THP)
- THP set to never
2️⃣ CPU & NUMA Hardening
🧠 NUMA Validation
- NUMA detected and considered
- Oracle memory interleaving enabled
- No CPU pinning unless justified
Recommended:
numactl --interleave=all
🔥 CPU Oversubscription
- No CPU quota throttling (VM / container)
- vCPU ≤ physical CPU
- Hyper‑threading understood and planned
3️⃣ Memory Hardening
📦 SGA & PGA
- SGA < per‑NUMA node memory
- PGA target sized (avoid PGA spills)
- HugePages configured (if applicable)
Check HugePages:
grep Huge /proc/meminfo
🚨 Swap
- Swap enabled but minimal
- No Oracle paging to swap
4️⃣ Storage & IO Hardening
💾 Disk Separation (Mandatory)
- Datafiles
- Redo logs
- Archive logs / FRA
- Temp
- Backups
No sharing of redo + data.
⏱ IO Latency Targets
| IO Type | Target |
|---|---|
| Redo writes | < 5 ms |
| Data reads | < 15 ms |
| Temp IO | < 20 ms |
Verify:
iostat -xm 1 5
🧩 ASM (If Used)
- Diskgroup redundancy defined
- Rebalance power controlled
- No mixed latency tiers in same DG
5️⃣ Oracle Parameter Hardening
✅ Mandatory Parameters
Ensure:
open_cursors≥ 2–3× app needprocessessized for peaksessions = processes * 1.5
⚡ Performance Safety
-
cursor_sharing= FORCE (only if needed) -
result_cache_modeevaluated -
optimizer_adaptive_featurescontrolled -
parallel_degree_policyunderstood
6️⃣ Security & Access Hardening
🔐 OS Level
- Oracle user non‑login shell (if allowed)
- SSH key‑based access
- No password reuse
- Root access logged
🔑 Database Level
- Password profiles enforced
- Default accounts locked
- Strong SYS password
- ADMIN roles minimized
7️⃣ Resource Management (Very Important)
🎛 Oracle Resource Manager
- Enabled in production
- Separate OLTP / Batch consumers
- CPU runaway prevention
8️⃣ Backup & Recovery Hardening
🛡 RMAN
- Daily incremental
- Weekly full
- Archive log backups
- Controlfile autobackup ON
🔁 Restore Testing
- Restore tested quarterly
- PITR validated
- Backup success monitored
9️⃣ Monitoring & Alerting
📊 OS Monitoring
- CPU, load
- Memory pressure
- IO latency
- Filesystem usage
🧠 Oracle Monitoring
- AWR enabled
- ASH accessible
- Tablespace growth alerts
- Session thresholds
🔍 Baselines
- CPU baseline captured
- IO latency baseline captured
- AWR baseline saved post go‑live
🔟 Patch & Lifecycle Management
🧩 Oracle
- Quarterly RU applied
- OPatch version updated
- Patch rollback plan ready
🧩 OS
- Kernel patches tested
- Reboot procedure documented
- Firmware audited (if bare metal)
1️⃣1️⃣ High Availability & DR
- Data Guard configured (if required)
- DG lag alerts
- Switchover tested
- DR RTO/RPO documented
1️⃣2️⃣ Documentation & Audit Readiness
- SOPs (CPU, IO, Outage)
- RCA template
- Architecture diagram
- Capacity forecast
- CMDB updated
✅ Final “GO‑LIVE” Gate Criteria
✔ Stable CPU baseline
✔ Disk latency within SLA
✔ RMAN recoverable
✔ Resource controls enabled
✔ Security controls enforced
📌 Pro Tip (Real‑World)
Most production outages happen due to lack of resource controls, not lack of hardware.
No comments:
Post a Comment