Oracle Exadata administration requires specialized monitoring of both database and storage cells. Daily tasks focus on ensuring high availability, maintaining cell health, and proactively reviewing hardware alerts
- Check overall cell health and status:
dcli -g cell_group -c "cellcli -e list cell detail" - Check hardware alerts:
cellcli -e list alerthistory where severity = 'critical'oractive - List physical and grid disks status:
cellcli -e list griddiskandcellcli -e list physicaldisk
- Monitor Flash Cache usage and efficiency:
cellcli -e list flashcache detail - Check IO statistics: Run
iostat -xor query cell metrics likecellcli -e list metriccurrent where name like '.*_IO.*'
- Monitor Clusterware status:
crsctl check clusterandcrsctl check cssd - Review database and Exadata wait events in AWR reports. Look for Exadata-specific waits like
cell single block physical read
How to get hardware alert in Exadata
- Oracle Enterprise Manager (OEM): Review hardware status on the Exadata Database Machine Home page where active incidents are outlined in red. Configure Incident Rules to send alerts via email or SNMP.
- Exadata Storage Server (Cells): Use the
LIST ALERTHISTORYcommand in CellCLI. You can configure email notifications directly from the cell. - Database Servers (Compute Nodes): Use
DBMCLIto review alerts and set up SMTP/SNMP notifications. - Oracle Auto Service Request (ASR): Register your Exadata to auto-open Service Requests with Oracle Support for critical hardware faults
HALRT-*) fires, take the following steps to evaluate the severity and plan the fix:show /SP/faultmgmtdiagpack if requested by OracleCellCLI to identify the failed drive, wait for Automatic Storage Management (ASM) to rebalance, and pull the drive once the blue "OK to remove" LED illuminates. All storage devices are hot-pluggable and can be replaced without powering downcelladmin or root) and verify the alert history to locate the failing diskCellCLI> LIST PHYSICALDISK WHERE status != 'normal' DETAIL
Note: Make a note of the name (e.g., 28:5) and slotNumber.3. Drop the Disk for ReplacementIf you are running Oracle Exadata System Software Release 21.2.0 or newer, use the following command to drop the physical disk while maintaining redundancy
bashCellCLI> ALTER PHYSICALDISK <disk_name> DROP FOR REPLACEMENT MAINTAIN REDUNDANCY NOWAIT
Wait for a storage server alert confirming the disk is dropped and data has successfully rebalanced before proceeding.4. Physically Replace the Disk- Locate the physical server (a white locator LED will be illuminated on the front of the chassis).
- Identify the disk itself (an amber "Fault" LED will be lit).
- Wait for the blue "OK to remove" LED to light up before pulling the drive.
- Press the disk ejection lever, pull out the failed drive, and slide the new drive into the chassis until it locks in place
5. Verify the ReplacementOnce the drive is inserted, the new disk will be automatically detected and configured. Run the following to confirm it is back to normal:bashCellCLI> LIST PHYSICALDISK WHERE name = <disk_name> ATTRIBUTES status
Full procedural breakdowns for varying disk types (e.g., hard disks, flash disks, or M.2 system disks) are available in the Oracle Exadata Maintenance Guide.
How to get hardware alert in Exadata and oracle support
To automatically receive Exadata hardware alerts and have them routed to Oracle Support, you must configure Oracle ASR (Auto Service Request). ASR automatically logs a Service Request (SR) with Oracle Support for specific hardware faults, while configuring Exadata Alert Notifications keeps your team informed- Install ASR Manager: Deploy the ASR Manager software on a standalone server external to your Exadata rack.
- Enable Telemetry: Configure your Exadata Database Servers and Storage Servers to send telemetry and traps to the ASR Manager.
- Activate ASR: Register and activate your ASR assets through the My Oracle Support portal to link them with your Oracle Support Identifier (CSI)
- Storage Servers: Log in via
CellCLIand use theALTER CELLcommand to define your SMTP mail server, from/to addresses, and notification policy. - Database Servers: Set up hardware fault alerts through the Integrated Lights Out Manager (ILOM) so you receive email or SNMP traps directly when a component fails
- Discover Targets: Use the Exadata plug-in within OEM to discover and promote all database nodes, storage cells, InfiniBand switches, and PDUs.
- View Hardware Incidents: Navigate to the Database Machine Home Page to view the schematic layout. Components with active incidents (like a faulty disk or fan) will be outlined in red.
- Setup Incident Rules: Configure Incident Rules in OEM to forward hardware alerts to your internal ticketing systems and alert designated administrator
- Storage Cell Alerts: Run
LIST ALERTHISTORYorLIST ALERTDEFINITIONusingCellCLIto check the storage cell alert log files. - Compute Node Alerts: Query the standard
alert.logor review the Oracle ILOM Event Log via SSH
- Oracle Exadata Platform: Official portal covering on-premises, OCI, and multi-cloud Exadata platforms.
- Oracle Exadata Database Machine - Operational Best Practices: Comprehensive PDF detailing tuning, isolation, and day-to-day operations.
- Exadata Performance and AWR: Guide to reading Exadata-specific statistics in Automatic Workload Repository (AWR) reports.
- Monitoring Oracle Exadata MOS Notes: Official Oracle blog tracking critical My Oracle Support (MOS) articles and patching guideline
During the daily tasks it is very helpful to have a collection of Exadata MOS notes. The following notes are more or less my „Favorites“ from MOS.
Information Center
1306791.2 – „Oracle Exadata Database Machine“
Master Note
888828.1 – Exadata Database Machine and Exadata Storage Server Supported Versions
Best Practices
757552.1 – Oracle Exadata Best Practices
1274318.1 – Oracle Sun Database Machine Setup/Configuration Best Practices
1244344.1 – Exadata Starter Kit
Operation Tasks
1473002.1 – Using dbserver_backup.sh to backup compute nodes
1538068.1 – Remove partition if dbserver_backup.sh fails
1428394.1 – Password stuff (pam_talley2)
1093890.1 – Shutdown and startup Exadata and Compute nodes on rack
1446274.1 – ILOM command reference (startup and shutdown Exadata from ILOM)
1520896.1 – DBFS Configuration Health Check
1054431.1 – Configure DBFS on Exadata Checklist
1553103.1 – latest dbnoteupdate.sh note
401749.1 – Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB Configuration
…
Cell-Storage Server
1921528.1 – SRDC – EEST Storage Cell General Issues
1306635.1 – Replacement of flash – how to check firmware and status. Resetting status
1188080.1 – Steps to shut down or reboot an Exadata storage cell without affecting ASM
1477020.1 – Exadata: ASM Diskgroup Showing Status Of _DROPPED_… After Storage Maintance
761868.1 – Oracle Exadata Diagnostic Information required for Disk Failures and some other Hardware issues
Patching
1262380.1 – Master note on Exadata patching
1473002.1 – Using ULN to install server patches with YUM
1545789.1 – ISO install Cheat Sheets
1136544.1 – Relinking notes
1553103.1 – Exadata Database Server Patching using the DB Node Update Utility
Software Specific Release Notes
1537407.1 – Oracle 12c
No comments:
Post a Comment