EasyReliableDBA: How to administer and patch Oracle Exadata Cloud Infrastructure (ExaCS) and ExaDB-D

Managing and patching Oracle Exadata Database Service on Dedicated Infrastructure (ExaDB-D/ExaCS) requires a combined approach where Oracle manages the physical infrastructure, and you manage the software stack from the hypervisor up. The OCI console simplifies patching via rolling, zero-downtime updates

1. Responsibility Matrix

Oracle: Maintains physical hardware, network fabric, PDU, switches, and hypervisors.
Customer: Responsible for patching the Guest VM OS, Grid Infrastructure (GI), Database Homes, and the databases themselv

1. Patching the Guest VM OS

Oracle manages the physical Exadata infrastructure, but you are responsible for patching the guest VM operating system

Via OCI Console: Go to the VM Cluster details, navigate to Updates (OS), select an available update, click Run Precheck, and once successful, click Apply Exadata OS Image Update.

Via CLI: Use the patchmgr utility on a designated Exadata compute node to drive operating system updates across all nodes in the cluster

2. Patching Oracle Grid Infrastructure (GI)

You must keep your Grid Infrastructure updated to match the latest quarterly Release Update (RU) requirements

Via OCI Console: Under your VM Cluster details in the OCI Console, click View Patches next to the Updates Available field. Select a patch, run a Precheck to validate prerequisites, and click Update Grid Infrastructure. This applies rolling patches to each node to prevent downtime

3. Patching the Oracle Database

Exadata databases are patched by applying Release Updates (RUs) using out-of-place patching, meaning you create or provision a new patched Oracle Home

Via OCI Console: Go to your Database Homes list, choose the database home, and review available updates. Select your target image, run a Precheck, and then move the database to the new patched home.

Via CLI: You can use dbaascli patch db apply to apply quarterly database patches, or use exadbcpatchmulti to handle multiple database patch operations directly from the command lin

3. Key Administration Commands

For command-line administration, you can utilize the dbaascli utility on the compute nodes

Check available database patches:
dbaascli patch db get_list

Apply a database patch (out-of-place or in-place):
dbaascli patch db apply --patchid <patch_id>

Run a precheck for patching operations:
dbaascli patch db precheck --patchid <patch_id>

OS / Grid Infrastructure Command Line: Use patchmgr from a driving node to orchestrate updates across compute nodes

4. Best Practices

Always back up your databases before executing any patching operations.
Follow the $N - 3$ versioning rule: Always use the current major version or one of the three most recent ones (N through N - 3).
Off-peak monthly patching is scheduled automatically by Oracle for critical infrastructure. Be sure to check the Maintenance Preferences in your Exadata Infrastructure settings to define rolling versus non-rolling schedules and reschedule if needed

Question : do we patch storage server on (ExaCS) and ExaDB-D

No, you do not manually patch the Exadata Storage Servers (Cells) on ExaCS (Exadata Cloud Service) or ExaDB-D (Exadata Database Service on Dedicated Infrastructure). Oracle manages the storage server patching and updates via cloud automation

Here is how maintenance responsibilities are divided:

Oracle-Managed Components: Oracle manages everything at and below the hypervisor. This includes the Storage Servers, hardware, firmware, and network fabric. Oracle updates these components in the background using rolling updates to ensure zero database downtime.
Customer-Managed Components: You are responsible for patching the software layers you control above the hypervisor. This includes:

Database Homes (Oracle Database software).
Grid Infrastructure (Clusterware and ASM).
Guest VM Operating System (the OS on your compute nodes)

How Storage Server Patching Works:

Rolling Execution: Updates to storage servers are applied in a rolling manner. Thanks to ASM High Redundancy and Exadata software design, this happens without impacting database or application availability.
Scheduling: While Oracle controls the underlying infrastructure patches, you can define maintenance windows and schedule your infrastructure updates using the Oracle Cloud Infrastructure Console

Question : what is general issue on patch Oracle Exadata Cloud Infrastructure (ExaCS) and ExaDB-D

Patch failures on Oracle Exadata Cloud Service (ExaCS) and ExaDB-D usually stem from outdated Cloud Tooling, lack of network connectivity to the Object Store, or improper cluster states. Addressing pre-checks and dependency conflicts early prevents most interruptions

The most common patching issues and how to tackle them include:

1. Outdated Cloud Tooling

The Issue: Attempting to patch the Grid Infrastructure (GI) or Database Homes without first updating the dbcli and dbaascli cloud tooling.
The Fix: Ensure all Exadata database nodes run the same, most current version of cloud tooling before initiating any upgrade sequence

2. Object Store Connectivity

The Issue: The virtual machine cannot reach the Oracle Cloud Infrastructure Object Store. This often happens if the service gateway or static route is misconfigured, causing patch downloads to stall.
The Fix: Verify your VCN route tables and ensure a static route exists for Object Storage on each compute node.

3. Database State and Custom Configurations

The Issue: Patching may fail if instances are down, ASM is not running properly, or custom wallet/listener files do not match across cluster nodes.
The Fix: Ensure the database instance status is Open and active on all nodes before starting operations. Temporarily restore standard configuration files (like custom wallets) if the patch process trips on them.

4. File System Space Constraints

The Issue: Insufficient disk space on the /u01 or /u02 partitions causes pre-checks to fail.
The Fix: Clear out old trace files, log files, or obsolete backups prior to patching

5. Custom OS Package Conflicts

The Issue: If you installed non-Exadata RPMs (extra OS packages) on the Guest VMs, the pre-check may flag conflicts with Oracle-installed RPMs.
The Fix: Resolve the RPM dependencies or uninstall the conflicting non-Exadata packages before trying the Guest VM upgrade again

Cloud Tooling and Administration (dbaascli)

dbaascli patch tools list: Displays the currently installed cloud tooling version and checks if any updates are available for your system.
dbaascli admin showLatestStackVersion: Returns the version number of the latest available dbaastools RPM stack update.
Context note: These commands are run as the root user after connecting to a compute node as the opc user

System Architecture and Environment Files

/var/opt/oracle/misc/platforminfo: A system file containing the deployment type identifier. On Exadata Cloud Service (ExaCS) or ExaDB-D, querying this file will return EXACS or EXACC (Cloud@Customer).
/usr/local/bin/imageinfo: An Exadata utility script used to generate a summary of the release versions and statuses of software, OS, and firmware components on your Exadata compute or storage nodes

1. Software Image Management

dbaascli cswLib listLocal: Lists the database software images and versions locally available in your environment for patching or provisioning

2. Encryption & Wallet Commands

dbaascli tde status --dbname <dbname>: Checks the status of the Transparent Data Encryption (TDE) keystore (open, closed, auto-login).
dbaascli database verify_wallet --dbname <dbname>: Validates the integrity and accessibility of the database wallet

3. Database Metadata & Administration

dbaascli database getDetails --dbname <dbname>: Returns specific configuration and operational details for the specified database

4. Backup Operations & Troubleshooting

dbaascli database backup --dbname <dbname> --getSchedules: Displays the configured automated backup schedules.
dbaascli database backup --getConfig --dbName <dbname> --configFile /tmp/<dbname>_cfg.txt: Exports current backup configuration parameters to a text file for review or editing.
dbaascli database backup --dbname <dbname> --list: Lists all available backups taken for the database.
dbaascli database backup --dbName <dbname> --showHistory --all: Displays a comprehensive, historical log of all backup jobs.
dbaascli database backup --dbname <dbname> --status --uuid <uuid>: Checks the status of a specific, previously run backup job using its UUID.
dbaascli database backup --getLatestBackupJob --dbname <dbname>: Fetches the job details and status of the most recent backup execution

The GetExaWatcherResults.sh command extracts ExaWatcher performance data on Oracle Exadata servers. It gathers detailed OS metrics (like CPU, memory, and network) between your specified timestamps, and compiles them into a compressed archive (e.g., .zip or .tar.gz)

What happens next?

Locate the Output: The generated archive is typically saved in the current directory or a designated directory (e.g., /opt/oracle.ExaWatcher/archive/).
Reviewing the Data: The archive contains CSV/raw data files for OS tools like iostat, mpstat, and vmstat, along with a small subset of pre-built visual charts

Exawatcher report

To collect from/to a certain date and time:

example:

# ./GetExaWatcherResults.sh --from 01/25/2025_13:00:00 --to 01/25/2025_14:00:00

Use the tfactl diagcollect command with the -node flag to target specific nodes. By default, TFA collects data for the past 12 hours and from all nodes. To pinpoint the collection, restrict it to the exact nodes and timeframe when the issue occurred

Useful Parameters for Targeting:

Target Nodes: Specify -node local for just the server you are on, or -node node1,node2 for a comma-separated list.
Time Range: Use -from and -to for an exact window, or -last <n>h|d to gather logs for the past $n$ hours or days (e.g., -last 4h).
Specific Component: Add flags like -crs, -asm, or -database <db_name> to restrict collection to those specific sub-systems

TFA report collection from nodes that experienced the issue.

./tfactl diagcollect -from "Feb/05/2025 02:00:00" -to "Feb/05/2025 07:00:00"

For more details

Troubleshooting Exadata Cloud Infrastructure Systems

Using the dbaascli Utility on Exadata Cloud Infrastructure

Patching and Updating an Exadata Cloud Infrastructure System Manually

Question : How to administer Oracle Exadata Cloud Infrastructure (ExaCS) and ExaDB-D

Administering Oracle Exadata Cloud Infrastructure (ExaCS) and Exadata Database Service on Dedicated Infrastructure (ExaDB-D) combines Oracle Cloud Infrastructure (OCI) management with standard Exadata Database Administration. You manage physical and virtual infrastructure via the OCI Console, while using traditional commands (e.g., srvctl, SQL*Plus, or dcli) for databases

1. Administer Infrastructure via OCI Console

Compute & VM Clusters: Use the OCI Console to provision and scale Exadata VM Clusters, manage database homes, and allocate CPU/RAM.
Patching & Updates: OCI handles the lifecycle management for Grid Infrastructure and database software images. You can schedule automated updates or trigger them manually via the console.
Storage Configuration: Manage your Exadata storage, adjust Exadata I/O Resource Management (IORM), and monitor metrics directly through the OCI dashboards.
Backups: Configure automated OCI-managed backups (which offload directly to Object Storage) and manage retention policies at the VM cluster level

2. Administer Databases & Grid Infrastructure

Command Line Utilities: For Grid Infrastructure operations, connect via SSH to the compute nodes and use utilities like srvctl (for database and service management), crsctl (for clusterware), and dcli (for executing commands across all compute nodes).
Database Operations: Continue using native Oracle tools like RMAN for backups, Data Pump for migrations, and SQL*Plus/SQLcl for typical database administration.
Data Guard: Set up, monitor, and failover/switchover Oracle Data Guard configurations—including multi-standby deployments—either using OCI automation in the console or via traditional DGMGRL commands

3. Identity, Access, and Security

Compartments & Policies: Control who can view and modify Exadata resources by configuring specific IAM policies and organizing your infrastructure into Compartments.
Encryption & Keystores: Manage your master encryption keys natively using OCI Vault, or integrate with Oracle Data Safe to manage user security and auditing

Question : what will you check and analyze in exadata through AWR

In Exadata, an AWR report integrates database-tier metrics with low-level storage cell statistics, offering deep visibility into hardware and software performance. To analyze the system effectively, you should check and evaluate the following key Exadata-specific areas in the report

1. Exadata Storage Server Health & Configuration

Exadata Configuration Differences: Checks for mismatched hardware or software releases across your storage servers (cells). Discrepancies can lead to unpredictable I/O behavior.
Server Health Report: Reviews disk statuses and validates that no grid disks or cell disks are unexpectedly offline, which would reduce your total available I/O bandwidth.

2. Smart Scan & Offload Efficiency

Offload Efficiency Percentages: Analyzes how many I/O operations are being offloaded to the storage cells. Low offload statistics usually mean the database is pulling raw data blocks instead of leveraging Exadata Smart Scans for filtering.
Storage Index Usage: Checks the number of "Smart IO bytes saved by storage index." Higher savings mean Exadata is successfully skipping reading data blocks that do not meet your query criteria, saving I/O resources

3. Smart Flash Cache & Flash Log Performance

Flash Cache Hit Ratios: Assesses read requests satisfied by flash rather than traditional hard disks. You can review cache usage broken down by workload type (OLTP, Scan, Keep).
Smart Flash Log Statistics: Ensures that log file parallel write operations are being accelerated by flash. You should check for "Flash Log Skips" and redo write latency histograms to identify high-latency I/O outliers

4. I/O Resource Management (IORM)

Top Databases by I/O Requests: Shows which databases or workloads on the Exadata machine are consuming the bulk of the I/O throughput.
IORM Wait Time: Evaluates queue times for flash and disk devices. If queue times are high (e.g., greater than 5-10 ms), IORM plans may need tuning to prevent noisy neighbors from impacting critical databases

5. I/O Outlier Analysis

Exadata Outlier Summary: Exadata typically distributes I/O requests evenly across all cells. AWR's outlier analysis pinpoints which individual cell servers, grid disks, or host HBAs are experiencing disproportionately high latency or service times compared to the rest of the storage grid.

Question : what will you check and analyze in exadata x8m through AWR

Analyzing an Exadata X8M AWR report requires looking beyond standard database wait events. You need to investigate the specialized RDMA over Converged Ethernet (RoCE) network, Smart Flash Cache, and Intel Optane Persistent Memory (PMEM)

Because Exadata uses a scale-out storage grid, the AWR report consolidates and surfaces crucial metrics in the Exadata Statistics section

1. PMEM Cache & Commit Accelerator (The X8M Advantage)

Exadata X8M leverages Intel Optane PMEM to bypass the standard network and storage software layers

Smart PMEM Read & Write Latency: Look for $\mu s$ (microsecond) wait times rather than $ms$ (millisecond) latencies. If PMEM read/write events show elevated times, investigate network interconnect or hardware issues.

Log File Sync Waits: In X8M, the PMEM Commit Accelerator logs commits directly to PMEM. Verify that log file sync and cell single block physical read wait times drop dramatically compared to traditional all-flash architectures

2. Exadata Smart Flash Cache Efficiency

Check if your most active data is sitting in flash.

Flash Cache Hit Ratios: Review the Flash Cache User Reads and User Writes sections. High percentages of unoptimized read requests (reads that hit spinning hard disks) indicate your working set outgrows the flash cache.

Write-Back vs Write-Through: Look for the ratio of First Writes to Overwrites. High overwrites indicate your flash cache is absorbing I/O effectively, dramatically saving disk write operations.

3. Smart Scan & Offload Efficiency

Smart Scans reduce the volume of data traveling across the Exadata network by filtering rows and columns at the storage server layer

Interconnect vs Eligible I/O: Compare cell physical IO interconnect bytes returned by smart scan to cell physical IO bytes eligible for predicate offload. A large disparity proves that predicate filtering (column/row pruning) is working well.

Storage Index Savings: Review the IO Saved by Storage Index metrics. If savings are consistently low, queries are not skipping unnecessary Exadata I/O regions efficiently, which points to tuning opportunities on table clustering or data types

4. Wait Events (Foreground & Background)

Correlate traditional database wait times with Exadata-specific hardware events

Cell Single Block Physical Read / Multiblock Physical Read: Analyze the average wait time for these events. High times typically mean you are hitting spinning disks instead of the PMEM or Flash cache tiers.

Reliable Message: In Exadata, this event indicates internal cell communications or cluster channel syncs. Spikes here may indicate network contention or RoCE adapter congestion

. IORM (I/O Resource Management)

IORM Wait Events: Verify that no specific database or pluggable database (PDB) is being excessively throttled. Look at IORM transient bottleneck and db file sequential read waits to ensure your consumer groups are properly prioritized. [1]

6. Health & Configuration Checks

Offline Disks: The Exadata Health Report section automatically alerts you if grid disks or cell disks are offline or degraded. Even one offline disk can halve your I/O bandwidth.
Storage Server Software Version: Ensure Exadata server versions are uniform across all storage cells to prevent mismatched offload algorithms or software limitations

Question : what will you check and analyze in exadata x8m if database is getting hanged

If an Exadata X8M database hangs, immediately identify if it is a global grid/cluster issue or an isolated database slowdown. Focus on cluster metrics, interconnect/network health (key to X8M's RoCE architecture), and storage cell bottlenecks

Review the following components systematically:

1. Database & Compute Nodes

Clusterware & High Availability: Run crsctl check cluster to see if the cluster is healthy. In X8M, hardware-based RDMA immediately catches severe node freezes; verify that the node hasn't been evicted.
Active Session History (ASH): Since the database is hung, generate a report using oradebug setmypid or Real-Time ADDM to identify the predominant wait classes (e.g., Cluster, Concurrency, System I/O).
Crucial Wait Events: Look for cell single block physical read (indicates storage tier bottlenecks) or log file sync (indicates log write issues).

2. Exadata X8M Network Fabric (RoCE)

The X8M replaces traditional interconnects with RDMA over Converged Ethernet (RoCE), making network latency the primary suspect

Switch Health: Check the Cisco or Mellanox switch ports for errors, drops, or packet discards using esxcfg-nics or native switch commands.

Interconnect Waits: Check for spikes in gcs drm freeze or gc cr block busy waits, which point to node-to-node communication stalls

3. Exadata Storage Cells

If the database isn't fully locked but queries are hanging, the issue may stem from the storage tier

Smart PMEM Log: In X8M, redo logs are written via RDMA directly to Persistent Memory (PMEM). Check v$sysstat or v$log to ensure PMEM commits are completing properly.

Storage CPU Utilization: Log into the cell servers via cellcli and run SCLI=list metriccurrent where objectType='CELL' and name like 'CPU_UTIL%' to ensure the cell server CPUs are not saturated.

Disk Queuing: Check for high I/O latencies using SCLI=list metriccurrent where objectType='GRIDDISK' and name like 'GD_IO_RQ_TM%' to see if grid disks are causing long waits

4. Diagnostics & Logs

Exachk: Run the Oracle Exachk health check tool to identify known hardware or software configuration bugs (e.g., node evictions not properly resetting).
ADRCI: Execute adrci and check the alert logs to capture specific incident numbers and trace files

Question : what will you check and analyze in exadata x8m if database is getting hanged with command

When an Oracle Exadata database hangs, troubleshooting requires looking past the database layer down into the Exadata-specific hardware (Storage Servers, InfiniBand/RoCE network

A robust troubleshooting workflow requires checking the following areas using specific tools and commands:

1. Database Tier Checks

If you can still connect to the database (even via sqlplus -prelim / as sysdba), investigate where sessions are spending their time.

Identify Critical Wait Events: Check for Exadata-specific wait events (e.g., cell single block physical read, cell smart table scan).

sql

SELECT event, SUM(wait_time), SUM(seconds_in_wait) 
FROM v$session_wait 
WHERE wait_class NOT IN ('Idle') 
GROUP BY event;

Look for Contention: Check for enqueue or locking issues.

Real-Time ADDM: If the database is completely unresponsive, use Real-Time ADDM to diagnose the hang without logging in.

2. Exadata Storage Level (PMEM & Smart Logging)

Exadata Smart PMEM Cache: Exadata X8M utilizes Persistent Memory (PMEM) and RoCE (RDMA over Converged Ethernet) to bypass traditional OS I/O stacks. Check for waits specifically related to the RDMA path: cell single block physical read: pmem cache or cell single block physical read: xrmem cache.
Cell Server (CellSRV) Metrics: Use the cellcli command line on the Exadata storage servers to check for any slow disk response times or interface issues on the RoCE network fabric

2. Storage Cell Tier Checks (Storage Servers)

If the database waits indicate I/O or cell-related issues, log into a compute node and use dcli or cellcli to interrogate the Exadata Storage Servers

Check Storage Server Health:

bash

dcli -c cell01,cell02,cell03 "cellcli -e list alertcurrent"

Verify Disk/Cell Health: Look for offline or critical disks or flash drives

bash
dcli -c cell01,cell02,cell03 "cellcli -e list griddisk attributes name,status"


Examine Quarantined Cells: Check if Exadata has quarantined any faulty offload operations that might be forcing the DB into slow single-block reads.bash
dcli -c cell01,cell02,cell03 "cellcli -e list quarantine"

3. Exadata X8M Persistent Memory (PMEM)

A key feature of the Exadata X8M is its RoCE network and NVDIMM/PMEM (Persistent Memory) write accelerators. If these hang, commit times drop to a crawl

Verify PMEM State: Use the cellcli tool to ensure the NVDIMM hardware and PMEM controllers are operating normally.

bash

dcli -c cell01,cell02,cell03 "cellcli -e list pmemdisk attributes name,status"

4. Fabric / Network (RoCE) Checks
Exadata X8M uses RDMA over Converged Ethernet (RoCE) for cluster communication and storage access. A network degradation here will look like a database hang

Check RoCE Switches: From the compute node, verify that there are no packet drops or latency spikes across the X8M network fabric. [1]

5. Operating System / Hardware Layer

Log to ILOM: If the storage or database server host OS completely fails to respond, query the server's Integrated Lights Out Manager (ILOM) for hardware-level faults or system freezes.

bash

show /SP/logs/event/list

Are you getting any specific wait events (e.g., `cell single block physical read`)?
Have you checked the Exadata Alert Log (`/opt/oracle.SupportTools/em/cell_alert_log`)?
Is this a total database freeze or a severe performance slowdown?

Review Log and Trace Files

If the database is completely hung and you cannot run SQL, use Real-Time ADDM to analyze the hang from outside the database. Then, check:

Alert Log: Located in diag/rdbms/.../trace/alert_<SID>.log. Look for "LGWR is taking too long" warnings.
LGWR Trace Files: Check for I/O errors or timeouts in the storage layer.
Cell Server Logs: On the storage cells, use the cellcli tool to check LIST METRICCURRENT for flash or PMEM health alerts

If your Exadata X8M database is hanging, immediately generate an oradebug hanganalyze and check the alert log for critical events like ORA-00070 (deadlock), memory leaks, or storage-offline events

For a methodical, Exadata-specific approach, analyze the following components in order to pinpoint the bottleneck:

1. The Alert Log & Trace Files

ORA-Errors: Look for sequence errors such as ORA-04031 (shared pool exhaustion) or checkpoint not complete messages (ORA-00316).
Exadata Cell Alerts: Check for storage-related hardware errors, flash cache failures, or quorum disk drops.
Trace File Analyzer (TFA): Run tfactl diagcollect -since 1h to grab all relevant logs across the grid infrastructure

2. Foreground & Background Wait Events

Review V$SESSION_WAIT and V$SYSTEM_EVENT to see where the system is blocked

Log File Sync / Log File Parallel Write: These indicate log writer (LGWR) stalls. In an Exadata X8M, which features RDMA and Smart PMEM Log, high wait times indicate a failure in the persistent memory tier, interconnect networking, or cluster lock contention.

Cell Single Block Physical Read: If latency is excessively high, it means reads are bypassing the flash cache and hitting the slower HDDs

3. Exadata-Specific Metrics (using cellcli)

Log onto your storage cells (via dcli or cellcli) to verify hardware and cache integrity

PMEM / XRMEM Cache: Verify the Smart PMEM cache status. Issues here can stall the database.

Flash Log Stalls: Verify that flashlog performance is healthy, as it directly impacts commit processing.

I/O Resource Management (IORM): Check if an IORM plan or category is causing a specific database/PDB to suffer from I/O starvation

4. Grid Infrastructure & Clusterware

Hung Cluster: Run crsctl check cluster -all to ensure the cluster nodes are communicating.
Interconnect: A hang is often triggered by network drops. Review the interconnect (private network) for packet drops or latency issues. Exadata X8M uses RoCE (RDMA over Converged Ethernet); check for switch port flaps.

5. CPU & Memory

Check operating system stats (top, vmstat) to see if you are facing CPU starvation (runqueue size) or memory swapping (pi/po).
In the database, check for latch free or buffer busy waits (due to unoptimized SQL or heavy concurrency).

To help narrow down the cause and provide a specific mitigation, tell me:

What specific wait events are currently showing as the highest in V$SESSION?
Are there any ORA- errors printed in the alert log right before the freeze began?
Is this a Single-Instance database or a RAC (Real Application Clusters) environment?

1. Alert Log and Trace File Locations

Review the database and cluster diagnostic logs to trace the root cause:

Database Alert Log: Usually located in $ORACLE_BASE/diag/rdbms/{DB_NAME}/{SID}/trace/alert_{SID}.log.
Hang Manager Logs: Look for messages containing ORA-32701 or dia0 background process trace files in $ORACLE_BASE/diag/rdbms/{DB_NAME}/{SID}/incident/incdir_*.
Exadata Cell Alert Log: Verify Exadata storage server health by checking /opt/oracle/cell/log/diag/asm/cell/{cell_name}/trace/alert.log.

2. What to Check and Analyze in the Logs

Scan the alert logs for specific signatures during the time of the hang:

Log Write/Commit Bottlenecks: Look for checkpoint not complete or LGWR wait for redo copy messages, which could point to I/O stalls.
PMEM / RoCE Issues: The Exadata X8M relies on Smart PMEM (Persistent Memory) for fast commits and RDMA over Converged Ethernet. Check for errors related to PMEM hardware faults or network fabric stalls.
OOM (Out of Memory): Look for memory allocation failures or ORA-04030 / ORA-04031 errors

. Deeper Diagnostic Actions

If the alert log points to a hang, use database-level diagnostics to extract specific data: [1]

System State Dump: Execute oradebug dump systemstate 266 to get a precise snapshot of all processes and what they are waiting for.
Hang Analyzer: Run oradebug hanganalyze 3 to identify the blocking and waiting process chains.
AWR & ASH: If you can still log in, generate an AWR report or query the Active Session History (ASH) to review the top wait events

Question: How to enable Data Guard Oracle on Exadata Cloud Infrastructure (ExaCS) and ExaDB-D

To enable Oracle Data Guard on Exadata Cloud Infrastructure (ExaCS) and Exadata Database Service on Dedicated Infrastructure (ExaDB-D), you can use the Oracle Cloud Infrastructure (OCI) Console. The process involves selecting your primary database and adding a standby database to create a Data Guard association or group

Steps to Enable Data Guard

Navigate to the Primary Database:
- Open the OCI navigation menu and go to Oracle Database, then select Exadata on Oracle Public Cloud (ExaDB-D) or Exadata Cloud@Customer (ExaCS).
- Select the Compartment and the VM Cluster containing the primary database.
- Click the name of the specific Database you want to protect.
Add a Standby Database:

Under the Resources section on the left, click Data Guard Associations (or Data Guard Group for newer versions like 19c+).
Click Add Standby or Enable Data Guard

Configure the Standby Settings:
- Select Peer VM Cluster: Choose the target region, availability domain, and the destination VM Cluster where the standby will reside.
- Data Guard Type: Select either Data Guard (standard) or Active Data Guard (requires additional licensing for features like real-time query).
- Protection Mode: Choose Maximum Performance (asynchronous) or Maximum Availability (synchronous).
- Database Credentials: Enter the SYS password for the primary database to authorize the creation.
Finalize and Monitor:

(Optional but recommended) Click Run Precheck to ensure the environment is ready before proceeding.
Click Add Standby or Enable Data Guard to start the provisioning process.
Monitor the progress via the Work Requests page. Once completed, the database role will reflect its new status (Primary or Standby)

Key Requirements & Best Practices

Infrastructure: For maximum fault isolation, configure the standby on a different Exadata Infrastructure than the primary.
Software Versions: Both the primary and standby VM Clusters must have identical DBaaS Tools and Agent versions.
Network: Ensure proper security rules are in place to allow network communication between the primary and standby client subnets

For more details

https://docs.oracle.com/iaas/exadatacloud/exacs/using-data-guard-with-exacc.html

Set up Oracle Data Guard Configuration between Databases on Oracle Exadata Database Service on Dedicated Infrastructure and Oracle Exadata Database Service on Exascale Infrastructure

Question : what is wait event gc cr block 2-way and gc current block 2-way and gc cr block busy

In Oracle RAC databases, the gc cr block 2-way event signifies a Consistent Read (CR) block requested by one instance being transferred directly from another instance over the cluster interconnect, involving exactly two nodes (the requestor and the holder) and a single network hop.

While 2-way transfers are the most efficient form of Cache Fusion, high waits for this event point to heavy inter-instance block contention.

To effectively troubleshoot and reduce these waits:

Identify the Object: Run an AWR (Automatic Workload Repository) report to check the "Segments by Global Cache Cr Blocks" section. Pinpointing the exact table or index causing the block transfers is step one.
Application Partitioning: Segregate workloads so that sessions modifying data (DML) run on the same instance that queries (SELECT) that same data, localizing block access and eliminating cross-node chatter.
Re-evaluate Index Usage: Frequent full-table scans or heavy index maintenance can trigger high block transfers. Optimize queries to use more localized or partitioned data access paths.
Optimize Interconnect: Ensure your cluster interconnect network is fast, reliable, and not acting as a bottleneck

The Oracle RAC wait event gc current block 2-way occurs when a session requests a data block in "Current" (DML/Exclusive) mode, and the block is transferred directly from a remote instance in 2 network hops (Requesting Instance $\rightarrow $ Holding Instance $\rightarrow $ Requesting Instance)

Meaning & Context

Current Mode: Indicates a request for the current block data (typically for DML like UPDATE, INSERT, DELETE, or SELECT FOR UPDATE) rather than a Consistent Read (CR) snapshot.
2-Way Transfer: The block is found in the cache of exactly one other remote instance and is sent directly over the interconnect. No third master instance is required for the transfer.
Normal Operation: In an Oracle RAC environment, block transfers are standard. This wait event alone does not necessarily indicate a problem, unless the wait times or total waits are excessively high and degrading performance

How to Diagnose and Tune

If this wait event is causing performance bottlenecks, it usually points to data/index contention across your cluster nodes. You can address it using the following steps:

Identify the Hot Objects: Use V$ACTIVE_SESSION_HISTORY or the Oracle AWR Report to find the specific segments (tables/indexes) associated with the waits.

Reduce Index Contention: Heavy INSERT operations (like appending to sequences) can cause "hot" blocks at the ends of indexes. Consider using partitioned indexes or increasing the number of sequence cache entries (e.g., CACHE 1000 NOORDER).

Tune Block Density: Increase PCTFREE on tables with high concurrency to reduce the number of rows per block. This helps minimize multiple instances hitting the exact same physical block simultaneously.

Review Cluster Interconnect: If the wait times are high, check the network infrastructure. Ensure your private interconnect is on a dedicated, high-bandwidth (10GbE or higher) network and verify that no network packets are being dropped

gc cr block busy :

The gc cr block busy is an Oracle RAC (Real Application Clusters) wait event indicating that a session requested a consistent read (CR) block, but the block transfer between instances was delayed. This means there is high contention for a "hot block" across nodes

Why Does It Happen?

Remote Pinning: The remote instance holding the block is actively modifying it (e.g., locking, updating) or has not yet finished writing its redo logs for that transaction.
Log Flush Delays: The transfer is held up because the holding instance cannot write to the online redo logs quickly enough.
Contention: Multiple instances are requesting the same block simultaneously

How to Diagnose & Fix

Identify the Hot Block: Use the V$SESSION_WAIT view to find the file and block number causing the waits (using parameters P1 and P2).
Find the Object: Map the file/block to a specific database table or index using DBA_EXTENTS.
Tune the Application:
- Optimize SQL queries to reduce large full table or index scans that span across nodes.
- If a single block holds too many small rows (e.g., sequence generators), consider increasing the cache size or partitioning the table.
Check I/O Performance: Review your redo log write times. If you see high log file sync waits alongside this event, your disk group for redo logs may be experiencing I/O bottlenecks

Command used

SRDC - Exadata Generic Required Diagnostic Data Collection for RMAN Duplicate (Doc ID 2658991.1)

=================

Please Upload a text file with the output for the following as root user.

Replace <dbname> with the database name having issue:

curl -v -X HEAD -u '<username>':'<passwd>' bkup_oss_url

curl -v -u <username>':'<passwd>' -s http://169.254.169.254/opc/v1/instance/ | egrep -v "user_data|ssh_authorized_keys|timeCreated"

rpm -qa | egrep -i 'dbaastools|dbaastools_exa|dcs' | cut -d- -f1

rpm -qa | grep dbaas

rpm -qa --last |egrep 'dbcs|dbaas|dtrs|dcs'

dbaascli patch tools list

dbaascli admin showLatestStackVersion

/var/opt/oracle/misc/platforminfo

/usr/local/bin/imageinfo

Upload /var/opt/oracle/creg/<dbname>.ini

hostname -f

dbaascli cswLib listLocal

dbaascli tde status --dbname <dbname>

dbaascli database verify_wallet --dbname <dbname>

dbaascli database getDetails --dbname <dbname>

dbaascli database backup --dbname <dbname> --getSchedules

dbaascli database backup --getConfig --dbName <dbname> --configFile /tmp/<dbname>_cfg.txt

dbaascli database backup --dbname <dbname> --list

dbaascli database backup --dbName <dbname> --showHistory --all

dbaascli database backup --dbname <dbname> --status --uuid <uuid from above for the failed job>

dbaascli database backup --getLatestBackupJob --dbname <dbname>

Collect logs specific to database and covering issue time as below

dbaascli diag collect --startTime <Format: YYYY-MM-DDTHH24:MM:SS> --endTime <Format: YYYY-MM-DDTHH24:MM:SS> --dbNames <dbname>

The command needs to be executed as the "root" user. Identify the timestamp of the failure and collect 2 hours before and 2 hours after covering the issue timeframe.

References :

SRDC - Exadata Cloud Mandatory Data Collection for Backup Cloud Services (Backup, Restore, Recovery) Issues (Doc ID 2886934.1)

Please zip & upload the logs

1) /var/opt/oracle/log/<dbname>

2) /opt/oracle/dcs/log/

3) /var/opt/oracle/log/dtrs/

===============================================================

TFA report collection from nodes that experienced the issue.

./tfactl diagcollect -from "Feb/05/2025 02:00:00" -to "Feb/05/2025 07:00:00"

sosreport

as root

#sosreport

Exawatcher report

To collect from/to a certain date and time:

example:

# ./GetExaWatcherResults.sh --from 01/25/2025_13:00:00 --to 01/25/2025_14:00:00

exacli cloud_user_syd1990clu03173@100.107.0.9> list ALERTHISTORY

See exacli_cell_ALERTHISTORY.txt for full deta

=====================================================================

What is the status of database right now ? Is it accessible to you ?

Provide below

srvctl config database -d <db_unique_name>

srvctl status database -d <db_unique_name>

Please do below on the standby database

srvctl stop database -d FMWROOT_iad1dg

srvctl start database -d FMWROOT_iad1dg

Then retry the the precheck

Patching fails with same error as below even after stopping and starting via srvctl ?

DCS-10061:Database FMWROOT is not running. Database is not running on node : ocivpsysofmw151

Also provide below

crsctl stat res -t

Looks like patch is applied on one of the nodes already

Please provide below from both nodes

opatch lsinv -detail

If node 1 and node 2 is already patched then why Grid Installed version is showing 19.23 instead of 19.25.

[root@ocivpsysofmw151 ~]# dbcli describe-component

System Version

---------------

25.1.1.0.0

Component Installed Version Available Version

---------------------------------------- -------------------- --------------------

GI 19.23.0.0.0 19.26.0.0

DB 19.23.0.0.0 19.26.0.0

[root@ocivpsysofmw152 ~]# dbcli describe-component

System Version

---------------

25.1.1.0.0

Component Installed Version Available Version

---------------------------------------- -------------------- --------------------

GI 19.23.0.0.0 19.26.0.0

DB 19.23.0.0.0 19.26.0.0

Please get me the output of below from both nodes

sudo su - grid

$ORACLE_HOME/OPatch/opatch lspatches

I can see you executed :

/opt/oracle/dcs/bin/dbcli update-server -p -v 19.25.0.0 -l

Please do as below on the second node ,as previous command is missing one 0

sudo su -

dbcli update-server -p -v 19.25.0.0.0 -l

dbcli describe-job -i <Prechecks_job_ID>

if successful ,do

dbcli update-server -v 19.26.0.0.0 -l

dbcli describe-job -i <job_ID>

Is that job from when you executed below

dbcli update-server -v 19.25.0.0.0 -l

If yes ,please get me

Job log

/opt/oracle/dcs/log/jobs/<JOBID>.log

Database alert log

/u01/app/oracle/diag/rdbms/$ORACLE_UNQNAME/$ORACLE_SID/trace/alert_<sid>.log

==================Exadata =======================

ACTION PLAN

------------------------

Please check if CRS is up and running on all nodes. Execute below commands in all nodes and update us.

# <GI_HOME>/bin/crsctl check crs

# <GI_HOME>/bin/crsctl check cluster -all

# <GI_HOME>/bin/crsctl stat res -t# <GI_HOME>/bin/crsctl query css votedisk

Please execute below command in all nodes and upload the file "crsctl_stat_<Host Name>.out"

# <GI_HOME>/bin/crsctl stat res -t > crsctl_stat_<Host Name>.out

Please run the tfactl with "-all" argument to collect diagnostic collection from the database nodes when the issue had occur.

Autonomous Health Framework (AHF) - Including TFA and ORAchk/EXAchk (Doc ID 2550798.1

Use TFA Collector - Tool for Enhanced Diagnostic Gathering (Doc ID 1513912.1)

Under GRID HOME/tfa/bin

run as root

# /opt/oracle.ahf/tfa/bin/tfactl diagcollect -all -noclassify -node local -from "2024-07-30 16:00:00" -to "2024-07-29 18:30:00"

***Please change the time plus and minus 4 hours to your problem window ***

Action Plan

===========

1. Please try to cleanup socket files and try to start the CRS

---

A. Stop the CRS and all related resources on problem node:

# crsctl stop crs -f

# ps -ef | grep d.bin

> If any "d.bin" process remain running from the GRID_HOME, kill them:

# kill -9 <d.bin_pid>

B. Remove the file in the "/etc/oracle/maps" location:

# rm -rf /etc/oracle/maps/*

C. Remove the socket files;

# rm -rf /var/tmp/.oracle/*

D. Remove the "gipc" files in the location "/u01/app/grid/crsdata/*/output/"

# rm -rf /u01/app/grid/crsdata/node1/output/*

E. Start CRS

# crsctl start crs

---

2. Then upload TFA from both the nodes covering the time of CRS startup.

To Collect TFA

==============

# /opt/oracle.ahf/tfa/bin/tfactl diagcollect -all -noclassify -node local -from "2024-07-30 16:00:00" -to "2024-07-29 18:30:00"

***Please change the time plus and minus 4 hours to your problem window ***

Hi,

Modify the permission as below.

# chown grid:dbmusers /etc/oracle/cell/network-config/cellinit.ora

Retry CRS startup. Let me know the outcome.

Please provide me output of below command in text file for review.

$ cluvfy comp software -n all -verbose

Execute this command as grid user.

Thanks,

Kindly follow below document and restore the permissions from good node.

Script to capture and restore file permission in a directory (for eg. ORACLE_HOME) (Doc ID 1515018.1)

Kindly follow below doc to reset the file permissions:

How to check and fix file permissions on Grid Infrastructure environment (Doc ID 1931142.1)

1. Stop CRS on the problem node

crsctl stop crs -f

2. Reset the permissions of all files and directories under Oracle <GRID_HOME>.

For 12c and above:

For clustered Grid Infrastructure, as root user

# cd <GRID_HOME>/crs/install/

# ./rootcrs.sh -init

3. Start CRS

crsctl start crs -wait

Please share the results. If it doesnt work then we need to restore the permissions from good node.

Kindly provide output of below from both the nodes.

ls -lrt /u01/app/19.0.0.0/grid/lib/libserver19.a

Can you please remove all the socket files and reboot the problematic node?

Kindly change the permission to chmod 755 /u01/app/19.0.0.0/grid/lib/libserver19.a and share the complete result of below commands:

# crsctl stop crs -f

# crsctl start crs -wait

=============================================

1. When did the problem start?

The problem occurred only once, April 10th, and after restarting the process ran fine, and it ran every day after the incident without any errors as well.

2. Did this work before?

This job work before, and as I tell you is working fine after this problem, error has occurred only in one execution, but if we don't know the cause, we can't avoid the same error in the future.

3. How often is it reoccurring?

The job is executed daily during last month, and only April 10th job failed.

4 have you performed any recent activity?

No activity performed.

Please share below details

Question: Please select your Oracle Database version.

19.19.0

Question: What is your Tenancy OCID?

Question: What is your Database System OCID?

database is in a exadata vm cluster with OCID:

ocid1.cloudvmcluster.oc1.eu-frankfurt-1.antheljrkbqa6viaypdjrfcozvb6zrkdvxdes7gi433zdqcw7lemkut3ypzq

Question: What is the region where the Database System was created?

Availability domain: jyBc:EU-FRANKFURT-1-AD-1

Question: Please provide a summary of the issue faced.

As described int his SR, we have a daily job to execute a db duplicate, and April 10th job failed with this errors

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of Duplicate Db command at 04/10/2025 02:16:04

RMAN-05501: aborting duplication of target database

RMAN-06136: Oracle error from auxiliary database: ORA-06550: line 1, column 42:

PLS-00553: character set name is not recognized

ORA-06550: line 0, column 0:

PL/SQL: Compilation unit analysis terminated

after reexecute the process, it finished ok.

We need know the cause of the error to try to prevent it to happening again.

you can see attached log of the db duplicate and the alert log of the target database.

Duplicate database is executed with this command:

dbaascli database delete --dbname CDELSM2P > $LOG_SQL/delete_duplicate_$FECHA.log

dbaascli database duplicate --dbName CDELSM2P --dbUniqueName CDELSM2P_x73_fra --sourceDBConnectionString oc-pro-exa-04-clu-02-hdfdi-scan.prodb03.ocpro.oraclevcn.com:1521/s_delta_smile_clone_pro.prodb03.ocpro.oraclevcn.com --oracleHome /u02/app/oracle/product/19.0.0.0/dbhome_1 --sourceDBTDEWalletLocation /home/oracle/scripts/CDELSM2P/duplicate/ewallet.p12 --sourceDBTdeConfigMethod FILE --tdeConfigMethod FILE --rmanParallelism 64 --rmanSectionSizeInGB 64 --waitForCompletion false < /home/oracle/scripts/CDELSM2P/duplicate/pwd_duplicate > $REPLICA_DIR/duplicate.lck

1 ) Can you please update if the environment / Database is OCI or OCI Classic or Exadata or Autonomous or On-Prem ?

is Exadata

2) If OCI Base database , Can you please share the output of below commands:

ssh to DB node

sudo su -

hostname

[root@oc-pro-exa-03-rep-03-ugwlh1 ~]# hostname

oc-pro-exa-03-rep-03-ugwlh1

date

[root@oc-pro-exa-03-rep-03-ugwlh1 ~]# date

Mon Apr 14 18:44:23 CEST 2025

uptime

[root@oc-pro-exa-03-rep-03-ugwlh1 ~]# uptime

18:44:35 up 95 days, 4:07, 5 users, load average: 1.68, 1.95, 2.06

last|grep reboot

[root@oc-pro-exa-03-rep-03-ugwlh1 ~]# last|grep reboot

df -h

[root@oc-pro-exa-03-rep-03-ugwlh1 ~]# df -h

Filesystem Size Used Avail Use% Mounted on

devtmpfs 63G 0 63G 0% /dev

tmpfs 126G 2.3G 124G 2% /dev/shm

tmpfs 63G 9.9M 63G 1% /run

tmpfs 63G 0 63G 0% /sys/fs/cgroup

/dev/mapper/VGExaDb-LVDbSys1 15G 8.6G 6.4G 58% /

/dev/mapper/VGExaDb-LVDbKdump 20G 175M 20G 1% /crashfiles

/dev/mapper/VGExaDbDisk.u01.20.img-LVDBDisk 20G 4.8G 16G 24% /u01

/dev/mapper/VGExaDbDisk.grid19.0.0.0.241015.img-LVDBDisk 50G 12G 39G 24% /u01/app/19.0.0.0/grid

/dev/mapper/VGExaDb-LVDbVar1 20G 2.9G 18G 15% /var

/dev/mapper/VGExaDb-LVDbTmp 3.0G 67M 2.9G 3% /tmp

/dev/sda1 412M 118M 295M 29% /boot

/dev/mapper/VGExaDb-LVDbVarLog 18G 1.3G 17G 7% /var/log

/dev/mapper/VGExaDb-LVDbVarLogAudit 3.0G 173M 2.8G 6% /var/log/audit

/dev/mapper/VGExaDbDisk.u02_extra.img-LVDBDisk 124G 69G 49G 59% /u02

/dev/mapper/VGExaDb-LVDbHome 4.0G 84M 4.0G 3% /home

oc-pro-mt-com-01.proappfss.ocpro.oraclevcn.com:/oc-pro-oem-exa-03-rep-03-ugwlh1-fss 8.0E 2.8G 8.0E 1% /u01/app/oracle/product/13.1.0

/dev/asm/acfsvol01-39 720G 25G 696G 4% /acfs01

tmpfs 13G 0 13G 0% /run/user/2000

oc-pro-mt-com-01.proappfss.ocpro.oraclevcn.com:/oc-pro-fss-migraciones-01 8.0E 6.4T 8.0E 1% /oc-pro-fss-migraciones-01

tmpfs 13G 0 13G 0% /run/user/1000

tmpfs 13G 0 13G 0% /run/user/1001

free -h

[root@oc-pro-exa-03-rep-03-ugwlh1 ~]# free -h

total used free shared buff/cache available

Mem: 125Gi 86Gi 24Gi 2.1Gi 14Gi 30Gi

Swap: 15Gi 2.4Gi 13Gi

hostnamectl

[root@oc-pro-exa-03-rep-03-ugwlh1 ~]# hostnamectl

Static hostname: oc-pro-exa-03-rep-03-ugwlh1

Icon name: computer-vm

Chassis: vm

Machine ID: 61913792fabd4df9a340bd5be6dad5cb

Boot ID: 83e3117624444056bb8fa545d5886819

Virtualization: kvm

Operating System: Oracle Linux Server 8.10

CPE OS Name: cpe:/o:oracle:linux:8:10:server

Kernel: Linux 5.4.17-2136.330.7.5.el8uek.x86_64

Architecture: x86-64

ps -fel | egrep "smon|tns" | sort -k 15

[root@oc-pro-exa-03-rep-03-ugwlh1 ~]# ps -fel | egrep "smon|tns" | sort -k 15

4 S root 10431 1 4 30 - - 433249 hrtime Mar24 ? 21:07:47 /u01/app/19.0.0.0/grid/bin/osysmond.bin

0 S grid 49886 1 0 80 0 - 67943 ep_pol Jan09 ? 00:36:42 /u01/app/19.0.0.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit

0 S grid 50261 1 0 80 0 - 68744 ep_pol Jan09 ? 00:40:02 /u01/app/19.0.0.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit

0 S grid 70119 1 0 80 0 - 68560 ep_pol Jan09 ? 00:11:10 /u01/app/19.0.0.0/grid/bin/tnslsnr LISTENER_SCAN2 -no_crs_notify -inherit

0 S grid 70113 1 0 80 0 - 68533 ep_pol Jan09 ? 00:11:09 /u01/app/19.0.0.0/grid/bin/tnslsnr LISTENER_SCAN3 -no_crs_notify -inherit

1 I root 37 2 0 60 -20 - 0 rescue Jan09 ? 00:00:00 [netns]

0 S grid 43896 1 0 80 0 - 903727 do_sem Jan09 ? 00:02:26 asm_smon_+ASM1

0 S root 67545 48036 0 80 0 - 2321 pipe_w 18:46 pts/3 00:00:00 grep -E --color=auto smon|tns

0 S oracle 48508 1 0 80 0 - 1841535 do_sem 06:35 ? 00:00:03 ora_smon_CDELSM2P1

0 S oracle 134699 1 0 80 0 - 1834853 do_sem Jan13 ? 00:04:20 ora_smon_CDELTA3P1

0 S oracle 128789 1 0 80 0 - 1899898 do_sem Jan13 ? 00:04:21 ora_smon_CDELTA4P1

cat /etc/oracle-release

[root@oc-pro-exa-03-rep-03-ugwlh1 ~]# cat /etc/oracle-release

Oracle Linux Server release 8.10

uname -a

[root@oc-pro-exa-03-rep-03-ugwlh1 ~]# uname -a

Linux oc-pro-exa-03-rep-03-ugwlh1 5.4.17-2136.330.7.5.el8uek.x86_64 #3 SMP Mon May 27 12:51:19 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux

cd /opt/oracle/dcs/bin

./dbcli list-dbhomes

N/A exadata

/opt/oracle/dcs/bin/dbcli describe-component

N/A exadata

./dbcli list-databases -j

N/A exadata

/opt/oracle/dcs/bin/dbcli list-pdbs -i $(/opt/oracle/dcs/bin/dbcli list-databases|awk 'NR==4 {print $1}')

N/A exadata

dbcli list-pdbs -i 512eb207-b4ff-4145-83e6-0212e08d8f3e

N/A exadata

dbcli describe-pdb -i 512eb207-b4ff-4145-83e6-0212e08d8f3e -n <PDB_NAME>

N/A exadata

./dbcli describe-database -in <db_name>

N/A exadata

/opt/oracle/dcs/bin/dbcli list-jobs -f `date --date='-3 day' '+%Y-%m-%d'`

N/A exadata

dbcli list-jobs|grep -i <dbname>

N/A exadata

not the last job ID listed with a status other than success

with the job ID you noted above check the details of that jobs

/opt/oracle/dcs/bin/dbcli list-jobs | grep 'Failure'

dbcli describe-job -i <id of failed job>

dbcli describle-job -i <job_ID> -j

N/A exadata

# /opt/oracle/dcs/bin/dbcli describe-job -i <failed_job_id> -l Verbose

Share log file ==> /opt/oracle/dcs/log/jobs/<failed_job_id>.log

/opt/oracle/dcs/log/dcs-agent.log

/opt/oracle/dcs/log/dcs-agent-debug.0.0.log

export DEVMODE=true

dbcli list-dbrs

dbcli list-vmshapes

sudo su - grid

crsctl check crs

[grid@oc-pro-exa-03-rep-03-ugwlh1 ~]$ crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

crsctl stat res -t

[grid@oc-pro-exa-03-rep-03-ugwlh1 ~]$ crsctl stat res -t

--------------------------------------------------------------------------------

Name Target State Server State details

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.DATAC2.ACFSVOL01.advm

ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.LISTENER.lsnr

ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.chad

ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.datac2.acfsvol01.acfs

ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlmounted on /acfs01,S

h1 TABLE

ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlmounted on /acfs01,S

h2 TABLE

ora.net1.network

ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.ons

ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.proxy_advm

ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

2 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.DATAC2.dg(ora.asmgroup)

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

2 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.LISTENER_SCAN1.lsnr

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.LISTENER_SCAN2.lsnr

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.LISTENER_SCAN3.lsnr

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.RECOC2.dg(ora.asmgroup)

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

2 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.asm(ora.asmgroup)

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlStarted,STABLE

2 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlStarted,STABLE

ora.asmnet1.asmnetwork(ora.asmgroup)

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

2 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.ccosmm2p_xxx_fra.db

1 OFFLINE OFFLINE STABLE

2 OFFLINE OFFLINE STABLE

ora.cdelsm2p_x73_fra.cdelsm2p_pdelsmip.paas.oracle.com.svc

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

2 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.cdelsm2p_x73_fra.db

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlOpen,HOME=/u02/app/o

h1 racle/product/19.0.0

.0/dbhome_1,STABLE

2 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlOpen,HOME=/u02/app/o

h2 racle/product/19.0.0

.0/dbhome_1,STABLE

ora.cdelsm2p_x73_fra.s_delta_smile_batch_des.svc

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

2 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.cdelsm2p_x73_fra.s_delta_smile_online_des.svc

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

2 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.cdelta3p_5fc_fra.cdelta3p_pdelta3p.paas.oracle.com.svc

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

2 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.cdelta3p_5fc_fra.db

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlOpen,HOME=/u02/app/o

h1 racle/product/19.0.0

.0/dbhome_2,STABLE

2 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlOpen,HOME=/u02/app/o

h2 racle/product/19.0.0

.0/dbhome_2,STABLE

ora.cdelta3p_5fc_fra.s_delta_bi_pro.svc

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

2 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.cdelta4p_86p_fra.cdelta4p_pdelta4p.paas.oracle.com.svc

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

2 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.cdelta4p_86p_fra.db

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlOpen,HOME=/u02/app/o

h1 racle/product/19.0.0

.0/dbhome_2,STABLE

2 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlOpen,HOME=/u02/app/o

h2 racle/product/19.0.0

.0/dbhome_2,STABLE

ora.cdelta4p_86p_fra.s_delta_batch_des.svc

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

2 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.cdelta4p_86p_fra.s_delta_online_des.svc

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

2 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.cvu

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.oc-pro-exa-03-rep-03-ugwlh1.vip

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.oc-pro-exa-03-rep-03-ugwlh2.vip

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.qosmserver

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.scan1.vip

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.scan2.vip

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

ora.scan3.vip

1 ONLINE ONLINE oc-pro-exa-03-rep-03-ugwlSTABLE

--------------------------------------------------------------------------------

sudo su - oracle

ps -ef|grep pmon

[oracle@oc-pro-exa-03-rep-03-ugwlh1 ~]$ ps -ef|grep pmon

grid 43585 1 0 Jan09 ? 00:07:38 asm_pmon_+ASM1

oracle 48281 1 0 06:35 ? 00:00:02 ora_pmon_CDELSM2P1

grid 50695 1 0 Jan09 ? 00:07:59 apx_pmon_+APX1

oracle 97283 95270 0 18:49 pts/4 00:00:00 grep --color=auto pmon

oracle 128610 1 0 Jan13 ? 00:09:41 ora_pmon_CDELTA4P1

oracle 132336 1 0 Jan13 ? 00:09:23 ora_pmon_CDELTA3P1

srvctl status database -d $ORACLE_UNQNAME

[oracle@oc-pro-exa-03-rep-03-ugwlh1 ~]$ srvctl status database -d $ORACLE_UNQNAME

Instance CDELSM2P1 is running on node oc-pro-exa-03-rep-03-ugwlh1

Instance CDELSM2P2 is running on node oc-pro-exa-03-rep-03-ugwlh2

srvctl config database -d $ORACLE_UNQNAME

[oracle@oc-pro-exa-03-rep-03-ugwlh1 ~]$ srvctl config database -d $ORACLE_UNQNAME

Database unique name: CDELSM2P_x73_fra

Database name: CDELSM2P

Oracle home: /u02/app/oracle/product/19.0.0.0/dbhome_1

Oracle user: oracle

Spfile: +DATAC2/CDELSM2P_X73_FRA/PARAMETERFILE/spfile.336.1198390307

Password file: +DATAC2/CDELSM2P_X73_FRA/PASSWORD/pwdcdelsm2p_x73_fra.320.1198375667

Domain: prodb.ocpro.oraclevcn.com

Start options: open

Stop options: immediate

Database role: PRIMARY

Management policy: AUTOMATIC

Server pools:

Disk Groups: DATAC2,RECOC2

Mount point paths:

Services: CDELSM2P_PDELSMIP.paas.oracle.com,s_delta_smile_batch_des,s_delta_smile_online_des

Type: RAC

Start concurrency:

Stop concurrency:

OSDBA group: dba

OSOPER group: racoper

Database instances: CDELSM2P1,CDELSM2P2

Configured nodes: oc-pro-exa-03-rep-03-ugwlh1,oc-pro-exa-03-rep-03-ugwlh2

CSS critical: no

CPU count: 0

Memory target: 0

Maximum memory: 0

Default network number for database services:

Database is administrator managed

sqlplus / as sysdba

SET LINESIZE 200

alter session set NLS_DATE_FORMAT = 'DD-MON-YY HH24:MI:SS';

SELECT NAME,OPEN_MODE,PROTECTION_MODE,PROTECTION_LEVEL,DATABASE_ROLE,DB_UNIQUE_NAME,PRIMARY_DB_UNIQUE_NAME,CON_ID FROM V$DATABASE;

select instance_name,version,startup_time from v$instance;

select banner_full from v$version;

show pdbs;

SQL>

NAME OPEN_MODE PROTECTION_MODE PROTECTION_LEVEL DATABASE_ROLE DB_UNIQUE_NAME PRIMARY_DB_UNIQUE_NAME CON_ID

--------- -------------------- -------------------- -------------------- ---------------- ------------------------------ ------------------------------ ----------

CDELSM2P READ WRITE MAXIMUM PERFORMANCE UNPROTECTED PRIMARY CDELSM2P_x73_fra 0

SQL>

INSTANCE_NAME VERSION STARTUP_TIME

---------------- ----------------- ------------------

CDELSM2P1 19.0.0.0.0 14-APR-25 06:35:38

SQL>

BANNER_FULL

----------------------------------------------------------------------------------------------------------------------------------------------------------------

Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production

Version 19.19.0.0.0

SQL>

CON_ID CON_NAME OPEN MODE RESTRICTED

---------- ------------------------------ ---------- ----------

2 PDB$SEED READ ONLY NO

3 PDELSMIP READ WRITE NO

EasyReliableDBA

Monday, 25 May 2026

How to administer and patch Oracle Exadata Cloud Infrastructure (ExaCS) and ExaDB-D

Question : do we patch storage server on (ExaCS) and ExaDB-D

Question : what is general issue on patch Oracle Exadata Cloud Infrastructure (ExaCS) and ExaDB-D

Question : How to administer Oracle Exadata Cloud Infrastructure (ExaCS) and ExaDB-D

Question : what will you check and analyze in exadata through AWR

Question : what will you check and analyze in exadata x8m through AWR

Question : what will you check and analyze in exadata x8m if database is getting hanged

bash
`dcli -c cell01,cell02,cell03 "cellcli -e list pmemdisk attributes name,status"`

4. Fabric / Network (RoCE) Checks
Exadata X8M uses RDMA over Converged Ethernet (RoCE) for cluster communication and storage access. A network degradation here will look like a database hang

Are you getting any specific wait events (e.g., `cell single block physical read`)?
Have you checked the Exadata Alert Log (`/opt/oracle.SupportTools/em/cell_alert_log`)?
Is this a total database freeze or a severe performance slowdown?

Question: How to enable Data Guard Oracle on Exadata Cloud Infrastructure (ExaCS) and ExaDB-D

Question : what is wait event gc cr block 2-way and gc current block 2-way and gc cr block busy

gc cr block busy :

Command used

No comments:

Post a Comment

Search This Blog

Monday, 25 May 2026

How to administer and patch Oracle Exadata Cloud Infrastructure (ExaCS) and ExaDB-D

Question : do we patch storage server on (ExaCS) and ExaDB-D

Question : what is general issue on patch Oracle Exadata Cloud Infrastructure (ExaCS) and ExaDB-D

Question : How to administer Oracle Exadata Cloud Infrastructure (ExaCS) and ExaDB-D

Question : what will you check and analyze in exadata through AWR

Question : what will you check and analyze in exadata x8m through AWR

Question : what will you check and analyze in exadata x8m if database is getting hanged

bashdcli -c cell01,cell02,cell03 "cellcli -e list pmemdisk attributes name,status"

4. Fabric / Network (RoCE) ChecksExadata X8M uses RDMA over Converged Ethernet (RoCE) for cluster communication and storage access. A network degradation here will look like a database hang

Are you getting any specific wait events (e.g., cell single block physical read)?Have you checked the Exadata Alert Log (/opt/oracle.SupportTools/em/cell_alert_log)?Is this a total database freeze or a severe performance slowdown?

Question: How to enable Data Guard Oracle on Exadata Cloud Infrastructure (ExaCS) and ExaDB-D

Question : what is wait event gc cr block 2-way and gc current block 2-way and gc cr block busy

gc cr block busy :

Command used

No comments:

Post a Comment

bash
`dcli -c cell01,cell02,cell03 "cellcli -e list pmemdisk attributes name,status"`

4. Fabric / Network (RoCE) Checks
Exadata X8M uses RDMA over Converged Ethernet (RoCE) for cluster communication and storage access. A network degradation here will look like a database hang

Are you getting any specific wait events (e.g., `cell single block physical read`)?
Have you checked the Exadata Alert Log (`/opt/oracle.SupportTools/em/cell_alert_log`)?
Is this a total database freeze or a severe performance slowdown?