EasyReliableDBA: How to resolve MRP stuck issues on a physical standby database

Standby Redo Logs (SRLs) are exclusively and strictly mandatory if you are running in Maximum Protection or Maximum Availability Data Guard protection modes, or if you want to use the Real-Time Apply feature

In Oracle Data Guard, Real-Time Apply applies redo directly from the Standby Redo Log (SRL) file as it is being written. During a switchover, this ensures the standby is fully synchronized with the primary before role reversal, reducing switchover downtime to a minimum

Mandatory Scenarios for SRLs

Data Guard Protection Modes: Required for SYNC redo transport, which underpins both Maximum Protection and Maximum Availability.
Real-Time Apply: Mandatory if you wish to apply redo as it streams into the standby (bypassing the need to wait for the primary to fill and archive log files).
Active Data Guard: Highly recommended (and effectively required) to leverage real-time reporting on a physical standby without la

Non-Mandatory Scenarios

SRLs are not strictly required if your Data Guard is configured in Maximum Performance mode with asynchronous (ASYNC) transport. In this setup, the redo stream goes into standard archived redo logs first, which are then shipped and applied later. However, even in this mode, SRLs are highly recommended to prevent data loss and support smoother role transitions.

Key Rules for SRL Configuration

Size: Must be identically sized to your primary database's Online Redo Logs.
Count: You must configure at least $(N + 1)$ SRL groups per thread, where $N$ is the number of online redo log groups on the primar

Real-time apply processes redo logs directly from the standby redo log file as they are being filled, reducing data loss and switchover times. Archive apply waits until a redo file is completely filled and archived before applying it

How to Enable Real-Time Apply

Real-time apply requires Standby Redo Logs (SRLs) to be configured on your standby database

Verify SRLs exist: Ensure standby redo logs exist and are sized correctly:
SELECT GROUP#, THREAD#, BYTES FROM V$STANDBY_LOG;

Start the apply service: On your physical standby database, execute

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;

(Note: The USING CURRENT LOGFILE clause is deprecated as of Oracle 12c and is no longer necessary).

Check status: Check your alert log or query the V$MANAGED_STANDBY view. If Real-Time Apply is actively working, the MRP0 process will display a status of APPLYING_LOG with CURRENT or valid sequence numbers.

SELECT PROCESS, STATUS, SEQUENCE# FROM V$MANAGED_STANDBY;

Real-Time Apply

What it is: Apply services actively read and apply redo changes directly from the Standby Redo Log file while it is still being filled by the RFS (Remote File Server) process.
How it applies: The MRP (Managed Recovery Process) or LMS (Logical Standby) applies transactions commit-by-commit to memory and disk simultaneously.
Benefits: Synchronizes the primary and standby databases in near real-time. It significantly decreases switchover/failover times since the data is already applied

Archive Log Apply

What it is: The default application method. Apply services wait for the Standby Redo Log file to be completely filled and archived to disk as an archived redo log before applying it.
How it applies: It reads the completed archive log file sequentially. This is often viewed as a "bulk operation" compared to the incremental, commit-by-commit nature of real-time apply.
Benefits: Acts as a built-in safety net. If an erroneous TRUNCATE or DROP TABLE command is executed on the primary, you have a window to pause the archive log apply and recover the lost data before it hits the standby

Resolve MRP (Managed Recovery Process) stuck issues on an Oracle Physical Standby database by canceling the current recovery session, identifying and transferring any missing archive logs or datafiles, and then restarting managed recovery

Step-by-Step Resolution

Cancel the Current Recovery
Stop the MRP process to safely make adjustments:

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

Identify the Archive Gap
Check the V$ARCHIVE_GAP view to determine which logs are missing between the primary and the standby databases:

SELECT * FROM V$ARCHIVE_GAP;

Resolve the Gap

Manually copy the missing archive logs from the primary server to the standby destination.
If they are missing, register the logs on the standby database

ALTER DATABASE REGISTER LOGFILE '<path_to_archived_log>';

Check for Missing Datafiles
MRP may stall if a tablespace was added on the primary but the physical file is missing on the standby. Query the standby for errors:

SELECT FILE#, ERROR FROM V$RECOVER_FILE;

If errors are returned (e.g., ORA-01111), verify STANDBY_FILE_MANAGEMENT is set to AUTO.

Restart MRP
Resume the managed recovery process so the database applies the remaining redo logs:

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;

Verify the Apply Lag

Confirm that the standby database is processing the backlog by checking the apply lag status:

SELECT THREAD#, SEQUENCE#, APPLIED FROM V$ARCHIVED_LOG;

For more Details

https://support.oracle.com/knowledgefs?docId=1221163

Please use one of the following method to find the archive log file that MRP sticks at:

a. Please query v$managed_standby from the standby database if MRP is started.

% ps -ef |grep -i mrp

SQL>select process, thread#, sequence#, status from v$managed_standby where process='MRP0';

PROCESS THREAD# SEQUENCE# STATUS

-------- --------- ---------- ------------

MRP0 1 548 WAIT_FOR_GAP

Note: Starting from 12.2 use V$DATAGUARD_PROCESS view instead of v$managed_standby

b. Stop the managed recovery and start the manual recovery.

SQL>recover managed standby database cancel;

SQL>recover automatic standby database;

If you use data guard broker, you need to do these from either DGMGRL or grid control.

For 11g data guard broker,

DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='APPLY-OFF';

DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='APPLY-ON';

If your standby is a RAC database, then

DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='APPLY-ON' WITH APPLY INSTANCE = <instance-name>;

<instance-name> is the name of the instance you want to become the apply instance.

For 10g Data Guard broker,

DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='LOG-APPLY-OFF'; DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='ONLINE';

If your standby is a RAC database, then

DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='ONLINE' WITH APPLY INSTANCE = <instance-name>;

To change the state of the standby database from grid control for 10g and 11g standby databases, follow these steps:

1. From the Data Guard Overview page, select the standby database you want to change.

2. Click Edit to go to the Edit Properties page.

3. Select Log Apply Off (or Online).

4. Click Apply.

5. When the process completes, a message indicating success is returned.

For 9i Data Guard Broker,

DGMGRL> ALTER RESOURCE '<resource name>' ON SITE '<site name>' SET STATE='PHYSICAL-APPLY-READY';

DGMGRL> ALTER RESOURCE '<resource name>' ON SITE '<site name>' SET STATE='PHYSICAL-APPLY-ON';

To change the state of the standby database from grid control for 9i standby database, follow these steps:

1. In the navigator tree, select the standby database resource.

2. In the right-hand property sheet, click Set State.

3. Click Apply Off (or Online).

4. Click OK.

c. Use the lowest checkpoint_change# in the data file header to find the archive log that the recovery needs to recover from:

SQL>select min(fhscn),fhrba_Seq SEQUENCE from x$kcvfh group by fhrba_Seq;

2. Check whether the archive log file you find from step 1 exists on the standby site in the location defined by the standby init parameter standby_archive_dest or log_archive_dest_1 if standby_archive_dest is not defined on the standby.

If so, then please compare whether it has the same size on the standby as it is on the primary by cksum command. For example

% cksum <archive log file full path/file name>

Note: if cksum is not available on your unix platform please see the document below which provides more commands, such as, SHA-1 and MD5 checksum utilities.

Note 549617.1 How To Verify The Integrity Of A Patch/Software Download?

You could also verify whether the archive log file is corrupted on either standby or primary site by

SQL>alter system dump logfile '<full path/archive log file name>' validate;

If you get the SQL prompt back without error, then the archive log file is not corrupted.

3. If either the archive log file from step 1 doesn't exist on the standby or it is corrupted, then you would need to get a new copy from the primary site. Please refer to the document below on how to resolve a gap manually.

Note 1537316.1 Data Guard Gap Detection and Resolution Possibilities

4. If the archive log file exists on the standby site and is not corrupted, then please check whether it is registered to the standby controlfile by querying v$archived_log. For example,

SQL>select name from v$archived_log where (thread#=1 and sequence#=192917) or (thread#=2 and sequence#=26903);

If the last applied sequence# for thread 1 is 192916 and the last applied sequence# for thread 2 is 26902. Or the thread# and the sequence# are what you find from step 1 which the MRP sticks at.

If the archive log file names come back from the query above, then they are registered to the standby controlfile. Otherwise, the standby controlfile doesn't aware of them.

2. Please use the queries below to identify the current sequence# on the primary for each thread, the last received sequence# on the standby and the last applied sequence# on the standby for

each thread so that you will know whether the standby is in sync with the primary, whether there is a transport lag or an apply lag.

Primary: SQL > select thread#, max(sequence#) "Last Primary Seq Generated"

from v$archived_log val, v$database vdb

where val.resetlogs_change# = vdb.resetlogs_change#

group by thread# order by 1;

<standby>: SQL > select thread#, max(sequence#) "Last Standby Seq Received"

from v$archived_log val, v$database vdb

where val.resetlogs_change# = vdb.resetlogs_change#

group by thread# order by 1;

<standby>: SQL > select thread#, max(sequence#) "Last Standby Seq Applied"

from v$archived_log val, v$database vdb

where val.resetlogs_change# = vdb.resetlogs_change#

and applied='YES'

group by thread# order by 1;

=======================================================================

Please see the following Causes and Solutions for the MRP stuck issues on a physical standby database. If the MRP stucking issues still couldn't be resolved, then you could use RMAN incremental backup method to roll forward your physical standby database.

Note: You couldn't use the RMAN incremental backup method to roll forward a logical standby database.

Note 290817.1 Rolling a Standby Forward using an RMAN Incremental Backup in 9i

Note 836986.1 Steps to perform for Rolling forward a standby database using RMAN incremental backup when primary and standby are in ASM filesystem

SOLUTION

Cause 1 : Log transport issues.

MRP sticks on the standby because the primary has trouble to ship redo to the standby. You could check the status of remote archive destination on the primary database. For example,

SQL> SELECT DESTINATION, STATUS, ERROR FROM V$ARCHIVE_DEST WHERE DEST_ID=2;

It could give you various errors and you would need to resolve the log transport errors.

I would only illustrate very common errors related with password file issues here, such as ora-1031, ora-1017, ora-16191, ora-12154 etc.

Solution 1: Please use cksum command to verify the size of password file on the primary and the standby sites, and make sure SEC_CASE_SENSITIVE_LOGON is set to false for 11g or above databases.

- use cksum command to verify the size of password file on the primary and the standby sites

% cd $ORACLE_HOME/dbs

% cksum <password file name> /* cksum command is only available on the UNIX platform */

Note: if cksum is not available on your unix platform please see the document below which provides more commands, such as, SHA-1 and MD5 checksum utilities.

Note 549617.1 How To Verify The Integrity Of A Patch/Software Download?

If the sizes are different, you could create a new password file on one primary node and then copy it to other primary nodes and standby nodes.

UNIX

orapwd file=$ORACLE_HOME/dbs/orapw<local ORACLE_SID> password=<sys password> entries=5

Windows

orapwd file=$ORACLE_HOME/database/PWD<local ORACLE_SID>.ora password=<sys password> entries=5

Note: if the instance name on the standby database is different from the primary instance name, then their password file names are different.

You could check your local database instance name by

SQL>show parameter instance_name;

- Please make sure SEC_CASE_SENSITIVE_LOGON init parameter is set to false on both the primary and the standby databases for 11g or above databases.

If you have to set SEC_CASE_SENSITIVE_LOGON to true for business security reason, then you must create your password file with ignorecase=y attribute. For example,

orapwd file=$ORACLE_HOME/dbs/orapw<ORACLE_SID> password=<sys password> ignorecase=y entries=5

Please see Note 799353.1 - How to Resolve Error in Remote Archiving for more about log shipping issues.

Adding an new Standby fails with error Ora-12154: TNS:could not resolve the connect identifier specified (Doc ID 1240558.1)

Cause 2 : Firewall caused partial archive log transferred.

The contents of the mrp trace file show:

Media Recovery Log /<path>/<datafile_name>.dbf

ORA-00332: archived log is too small - may be incompletely archived

ORA-00334: archived log: '/<path>/<datafile_name>.dbf'

Background Media Recovery terminated with error 332

ORA-00332: archived log is too small - may be incompletely archived

ORA-00334: archived log: '/<path>/<datafile_name>.dbf'

----- Redo read statistics for thread 1 -----

The other common error is ORA-3135 and recommend you to check this cause and solution.

ORA-03135: connection lost contact when shipping redo log to standby database

ORA-03135: connection lost contact

ARC2: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3135)

Solution 2: Please make sure the following firewall features are disabled.

- SQLNet fixup protocol

- Deep Packet Inspection (DPI)

- SQLNet packet inspection

- SQL Fixup

- SQL ALG (Juniper firewall)

The wording and features can vary by vendor but all the above have some impact on some packets (not all packets are affected). Some firewalls can have an adverse effect on certain SQL packets transported across them (again, some not all).

You could then resolve the gap by following Note 1537316.1 Data Guard Gap Detection and Resolution Possibilities.

Please see the following articles related to ora-3135.

Note 404724.1 ORA-03135 when connecting to Database

Note 739522.1 ORA - 03135 connection lost contact while shipping from Primary Server to Standby server

Note 730066.1 Diagnosis of ORA-3135/ORA-3136 Connection Timeouts when the Fault is in the Database

Note 787354.1 Troubleshooting ORA - 3135 Connection Lost Contact

Cause 3 : ARC process on the primary that is responsible for the heartbeat sticks on the network due to bug 6113783

You could see different symptoms. For example, you get FAL archive error as the primary arch process sticks and it doesn't ship the missing archive log files to the standby. The worst case you see is all arch processes stick and no one does local archiving and all online redo log files are full and that causes the primary database hangs. Please see the following errors related to this cause.

FAL[client]: Failed to request gap sequence

GAP - thread 1 sequence 2262-2266

DBID 2591313681 branch 686760021

FAL[client]: All defined FAL servers have been attempted.

SQL> alter session set nls_date_format='YYYY-MON-DD HH24:MI:SS';

SQL> select message, timestamp

from v$dataguard_status

where severity in ('Error','Fatal')

order by timestamp;

MESSAGE

-------------------------------------------------------------------------------

TIMESTAMP

--------------------------

Error 12154 received logging on to the standby

OCT-12-2010 15:55:45

PING[ARC1]: Heartbeat failed to connect to standby '<standby_name>'. Error is 12154.

OCT-12-2010 15:55:45

Solution 3: The bug 6113783 was fixed in 11.2.0.2. You could workaround the issue by killing the arch processes on the primary database.

This won't harm your primary database at all as arch processes will be respawned automatically immediately by Oracle.

% ps -ef |grep -i arc

% kill -9 <ospid of arc process> <ospid of arc process> <ospid of arc process>...

NOTE: If you are on Windows Platform you can use either 'orakill'- command or any OS-Tool to kill the Windows Processes/Threads

Cause 4. The standby recovery asked for old sequence # that had been applied to the standby.

a. This is caused by bug 6029179 that was fixed in 10.2.0.4.1.

Bug 6029179 Abstract: MRP WAITING FOR LOGS THAT ARE ALREADY APPLIED (with RAC primary)

b. This could be caused by ARC process that is responsible for standby heartbeat sticking on

network as well due to Bug 6113783

Solution 4 : Apply the fix of bug 6029179 or kill the heartbeat arch process of each primary instance if the primary is a RAC database.

a. Apply the fix of Bug 6029179 that was fixed in 10.2.0.4.1.

b. You could identify the heartbeat arch process from the primary alert log file. Look for

the last start up information in the alert log file. For example,

Starting ORACLE instance (normal)

...

processes = 1800

.....

ARC4 started with pid=27, OS id=8805

ARC2: Archival started

ARC3: Archival started

ARC3: Becoming the 'no FAL' ARCH

ARC3: Becoming the 'no SRL' ARCH

ARC2: Becoming the heartbeat ARCH

...........

Completed: ALTER DATABASE OPEN

So ARC2 is the heartbeat arch process for this startup. If you checked the previous startup information,

it might be a different arch process. Then you could find the ospid of arc2 process and kill it.

% ps -ef |grep -i arc2

% kill -9 <ospid of arc2>

NOTE: If you are on Windows Platform you can use either 'orakill'- command or any OS-Tool to kill the Windows Processes/Threads

Please do this for each primary instance if the primary database is a RAC database.

c. The worst case would be to use RMAN incremental backup method to roll forward your standby database.

If your primary database is small, you could simply take a full database backup from the primary and restore it to the standby server to refresh or recreate your physical standby database.

Cause 5 Recover from the wrong location.

a. The init parameter standby_archive_dest is not defined on the standby database for 10g or lower version. The default location is $ORACLE_HOME/dbs.

b. You use RMAN or manual method to restore or transfer archive log files to a different location

than the local archive location log_archive_dest_1 or standby_archive_dest defined on the standby.

Solution 5 Specify standby_archive_dest the same location as log_archive_dest_1 on the physical standby. Use from 'location' attribute in the manual recovery clause.

a. Always specify standby_archive_dest the same location as the local archive destination log_archive_dest_1 for 10g or lower version physical standby database.

It is obsoleted in 11g.

b. When archive log files are located in a different location than the recovery expects, then you could use from 'location' attribute in the manual recovery clause. For example,

SQL>alter database recover managed standby database cancel;

SQL>alter database recover automatic from '/<different_location>/' standby database;

<different_location> is where your archive log files are located.

Cause 6: All standby redo log files are active on the standby database.

So that the standby redo log files couldn't receive new redo from the primary database.

SQL> select group#,thread#,bytes/1024/1024 as mbytes,used,archived,status

from v$standby_log

GROUP# THREAD# MBYTES USED ARC STATUS

---------- ---------- ---------- ---------- --- ----------

15 2 100 33292288 NO ACTIVE

16 2 100 3323904 NO ACTIVE

17 1 100 98733056 NO ACTIVE

18 2 100 3813376 NO ACTIVE

19 2 100 4763648 NO ACTIVE

...........

38 1 100 98072576 NO ACTIVE

39 2 100 7388672 NO ACTIVE

40 2 100 4899328 NO ACTIVE

Standby alert log file showed:

ORA-16014: log 15 sequence# 13234 not archived, no available destinations

ORA-00312: online log 15 thread 2: '+<diskgroup1>/<db_name>/onlinelog/group_15.457.

682446507'

ORA-00312: online log 15 thread 2: '+D<diskgroup2>/<db_name>/onlinelog/group_15.456.

682446509'

ORA-00312: online log 15 thread 2: '+<diskgroup3>/<db_name>/onlinelog/group_15.490.

682446509'

Solution 6: Please make sure you have enough space in the archive location.

log_archive_dest_1 is defined with proper valid_for values and db_unique_name.

standby_archive_dest is specified properly.

a. Please make sure you have enough space in the local archive destination

specified by log_archive_dest_1 so that the standby redo log files could be

archived successfully.

Please also make sure you have enough space in the location specified by

standby_archive_dest so that the primary arch process could ship archive log files

to it successfully.

When log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST...', you would

need to check the location specified by init parameter db_recovery_file_dest.

For example:

log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST valid_for=(ALL_LOGFILE,ALL_ROLES)

db_unique_name=<standby_name>'

db_recovery_file_dest=+<diskgroup1>

db_recovery_file_dest_size=214748364800

b. Please make sure the local archive destination log_archive_dest_1 is defined

with VALID_FOR=(ALL_LOGFILES,ALL_ROLES) and DB_UNIQUE_NAME=<local db DB_UNIQUE_NAME>.

For example:

OracleÂ® Data Guard Concepts and Administration

3 Creating a Physical Standby Database

LOG_ARCHIVE_DEST_1='LOCATION=/<path>/<standby_name>/ VALID_FOR=(ALL_LOGFILES,ALL_ROLES)

DB_UNIQUE_NAME=<local database DB_UNIQUE_NAME>'

c. Please specify standby_archive_dest properly on the standby database. For example,

STANDBY_ARCHIVE_DEST=/<path>/<db_name>/

When LOG_ARCHIVE_DEST_1='LOCATION=USE_DB_RECOVERY_FILE_DEST VALID_FOR=(ALL_LOGFILES,ALL_ROLES)

DB_UNIQUE_NAME=<local database DB_UNIQUE_NAME>', you specify STANDBY_ARCHIVE_DEST='LOCATION=USE_DB_RECOVERY_FILE_DEST'.

The init parameter standby_archive_dest is deprecated in 11g.

You could use alter system command to modify or define LOG_ARCHIVE_DEST_1 or STANDBY_ARCHIVE_DEST.

If your database is a RAC database, please use attribute sid='*' so that the change will be

made to all RAC instances for the database. For example,

SQL>alter system set log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST

valid_for=(ALL_LOGFILES,ALL_ROLES) db_unique_name=<standby_name>' scope=both sid='*';

If you use data guard broker, then you need to shutdown the broker first before you issue the command above from sqlplus. For example,

SQL>alter system set dg_Broker_start=false sid='*' scope=both;

Then after the change, you could start the broker by

SQL>alter system set dg_Broker_start=true sid='*' scope=both;

SQL>alter system set standby_archive_dest = 'LOCATION=USE_DB_RECOVERY_FILE_DEST' scope=both sid='*';

When 'scope' is not specified, the default value is 'both'.

If you use broker, you could modify the StandbyArchiveLocation property as below:

DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET PROPERTY 'StandbyArchiveLocation'='LOCATION=USE_DB_RECOVERY_FILE_DEST';

You would need to issue the command on all standby instances if the standby database is a RAC database.

d. Please make sure the standby redo log file size on the standby database

are the same size as the primary online redo log file size and you have one more

standby redo log group on the standby than the primary online redo log group for each thread.

For example, if you have 2 threads on the primary database and three online redo log groups

for each thread on the primary, then you would need 4 standby redo log groups for thread 1

and 4 standby redo log groups for thread 2 on the standby database no matter how many instance

you have on the standby database.

In case of switchover and the database roles change, the same principle applies. For example,

if your current standby database has only one instance, and three online redo log groups for it,

then you would need only 4 standby redo log groups for thread 1 on the current primary database

no matter how many instances/threads it has.

Reference:

MAA - Creating a RAC Physical Standby for a RAC Primary (Doc ID 380449.1)

e. Please clear all standby redo log groups by

SQL> alter database clear standby logfile group <#>;

Cause 7 : Partial archive log file is applied on the standby database.

The standby alert log file showed:

Fri Jan 9 20:11:18 2009

MRP0: Background Managed Standby Recovery process started (<standby_name>)

Managed Standby Recovery not using Real Time Apply

parallel recovery started with 7 processes

Media Recovery Log /<path>/<db_name>-656901558185964.arc

Fri Jan 9 20:11:24 2009

Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM

SESSION

Fri Jan 9 20:11:24 2009

Incomplete Recovery applied until change 14025537844

Fri Jan 9 20:11:24 2009

MRP0: Media Recovery Complete (<standby_name>)

Fri Jan 9 20:11:26 2009

MRP0: Background Media Recovery process shutdown (<standby_name>)

You could find which archive log file contains the change# 14025537844 by querying v$archived_log.

SQL>select thread#, sequence#, name, first_change#, next_Change#, deleted, status from v$archived_log where 14025537844 between first_change# and next_Change#;

Solution 7 Use RMAN incremental backup method to roll forward your physical standby database.

If your primary database is small, you could simply take a full database backup from the primary

and restore it to the standby server to refresh or recreate your physical standby database.

To prevent this happening again, please perform the following before you cancel the managed recovery

on the physical standby database or shutdown the physical standby database.

a. Defer the remote log transport on the primary database. For example, if log_archive_dest_2 is the remote archive destination,

SQL>alter system set log_archive_dest_state_2=defer; (for single instance primary database)

SQL>alter system set log_archive_dest_state_2=defer sid='*'; (for RAC primary database)

b. Please make sure the standby recovers the last received archive log file on the standby.

You can go to the location defined by log_archive_dest_1 or standby_archive_dest to see the most recent received archive log file.

Then check the latest applied sequence on the standby by the query below:

SQL>select thread#, max(sequence#) from v$archived_log where applied='YES' group by thread#;

Cause 8 : The archive log files were transferred to the standby manually, not through data guard log transport service.

So that they are not registered to the standby control file and MRP (Managed Recovery Process) couldn't recognize them.

For example, they are transferred from the primary by ftp, or scp, etc. or rman backup them and restores them to the standby.

The standby alert log file showed:

Media Recovery Waiting for thread 1 sequence 13615 (in transit)

Solution 8 Register those archive log files or use manual recovery.

a. Register those manually transferred archive log files and then use managed recovery to recover them.

SQL> alter database recover managed standby database cancel;

SQL> alter database register logfile '<full path/archive log file name>';

or you could use one rman command to catalog all the archive logs that were manually transferred and needed to be added to the controlfile at the standby site.

rman> catalog start with 'PATH_TO_ARCHIVELOGS/';

SQL> alter database recover managed standby database disconnect;

b. If you have lots of archive log files to register, then you could simply use manual recovery without registering them.

Make sure the archive log files are located in the location specified by log_archive_dest_1 or standby_archive_dest.

If not, then you would need to use from 'location' attribute in the manual recovery clause.

SQL> alter database recover managed standby database cancel;

SQL> alter database recover automatic standby database;

SQL> alter database recover automatic from '/<path>/' standby database;

/<path>/ is the location where your archive log files are located, which is different from the local archive destination or the standby archive destination.

Cause 9 : The archive log files are deleted from the primary before they are shipped and applied to the standby.

You use RMAN to do backup and archive log files are deleted after backup before they are shipped and applied to the standby.

Solution 9 Restore archive log files from backup, register them and use managed recovery to recover them; or use manual recovery without registering them.

Please refer to the Solution 8 for details.

If you couldn't find the archive log files anywhere, then you have to use RMAN incremental backup to roll forward your physical standby database.

If your primary database is small, you could simply take a full database backup from the primary

and restore it to the standby server to refresh or recreate your physical standby database.

To prevent this happens again, please reconsider your RMAN deletion policy.

If your archive log files are configured to be in a flash recovery area, then you could configure RMAN deletion policy to APPLIED ON STANDBY so that the archive log files are deleted after they are applied to the standby database.

Please refer to the document below for details:

Configure RMAN to purge archivelogs after applied on standby (Doc ID 728053.1)

http://download.oracle.com/docs/cd/B19306_01/server.102/b14239/rman.htm#SBYDB00750

OracleÂ® Data Guard Concepts and Administration

10g Release 2 (10.2)

10.3.4 Deletion Policy for Archived Redo Log Files In Flash Recovery Areas

http://download.oracle.com/docs/cd/B28359_01/server.111/b28294/rman.htm#BAJHHAEB

OracleÂ® Data Guard Concepts and Administration

11g Release 1 (11.1)

11.3.2 RMAN Configurations at the Primary Database

11.3.3 RMAN Configurations at a Standby Database Where Backups are Performed

11.3.4 RMAN Configurations at a Standby Where Backups Are Not Performed

http://download.oracle.com/docs/cd/E11882_01/server.112/e17022/rman.htm#SBYDB4854

OracleÂ® Data Guard Concepts and Administration

11g Release 2 (11.2)

11.3.2 RMAN Configurations at the Primary Database

11.3.3 RMAN Configurations at a Standby Database Where Backups are Performed

11.3.4 RMAN Configurations at a Standby Where Backups Are Not Performed

Cause 10 : New datafiles are added to the primary, but they are not added to the standby automatically.

This could be caused by standby_file_managment=manual on the standby database.

Sometimes the new datafiles failed to be added to the standby database automatically even though

standby_file_managment was set to AUTO.

You could see various errors. Please refer to the documents below:

Note 743101.1 ORA-01110 ORA-1122 ORA-01251 When Restoring With a Backup from a Standby DB

Note 304488.1 Using standby_file_management with Raw Devices

Note 759181.1 Dataguard Physical Standby occurs ORA-600 [2130] after added new node to primary RAC database.

Note 549078.1 ORA-600 [25016] Errors Trying To Recover Managed Standby Database

Bug 3596262 Add DATAFILE on primary can cause standby to raise ORA-1274

ORA-1274 Encountered on Physical Standby After Adding Datafile to Primary (Doc ID 388659.1)

ORA-01110: data file 7: '+<diskgroup>/<path>/<datafile_name>.300.740842175'

ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 142995

ORA-00600: internal error code, arguments: [6101], [0], [40], [128], [], [], [], [], [], [], [], []

Slave exiting with ORA-10562 exception

ORA-10562: Error occurred while applying redo to data block (file# 7, block# 237743)

ORA-10564: tablespace <tablespace_name>

ORA-01110: data file 7: '+<diskgroup>/<path>/<datafile_name>.300.740842175

Please work with Oracle Corruption team on the root cause of datafile corruptions. Please refer to the

following documents:

Prevention, Detection and Repair of Database Corruption (Doc ID 76375.1)

Best Practices for Corruption Detection, Prevention, and Automatic Repair - in a Data Guard Configuration (Doc ID 1302539.1)

Solution 10 Add the new datafiles to the standby database manually.

1) Please take a hot backup of new datafiles from the primary database.

2) Create a new standby controlfile from the primary database by

SQL>alter database create standby controlfile as '/tmp/controlf.ctl';

If datafiles are on ASM, please follow the note below and you could ignore the rest of steps:

Note 734862.1 Step By Step Guide On How To Recreate Standby Control File

When Datafiles Are On ASM And Using Oracle Managed Files

Or you could modify the wrong datafile name in the standby controlfile by alter database rename command. For example,

SQL> ALTER DATABASE RENAME FILE '<path/UNNAMED00003>' to '<path/real datafile name>';

3) If the new datafile location on the primary is different from the standby, please make sure

db_file_name_convert init parameter is set on the standby database.

Note 47325.1 Init.ora Parameter "DB_FILE_NAME_CONVERT" Reference Note

If db_file_name_convert init parameter has already been set, then you could ignore this step.

4) Cancel the managed recovery

SQL>alter database recover managed standby database cancel;

5) set standby_file_management=manual on the standby database and shutdown the standby database.

SQL>alter system set standby_file_management=manual sid='*';

SQL>shutdown immediate;

6) Copy the hot backup of the new datafiles and the new standby controlfile to the standby.

Please make sure the controlfiles are located in the right location with right names

according to the init parameter control_files. Please make sure the copied datafiles are

located in the right location as well according to name from v$datafile.

7) startup the standby database in mount mode and set standby_file_management=auto.

SQL>startup mount;

SQL>alter system set standby_file_management=auto sid='*';

8) Start the managed recovery.

SQL>alter database recover managed standby database disconnect;

Cause 11 : MRP can stall after control file contention at the standby.

MRP is waiting for a log that has already been shipped but does not appear in v$archived_logs and there are ORA-16146 errors reported in the standby alert log.

Solution 11 This is fixed in 10.2.0.4. The workaround is to restart the standby.

Reference: Note 6004936.8 Bug 6004936 Standby apply stops after ORA-16146

You can directly participate in the Discussion about this Note below. The Frame is the interactive live Discussion - not a Screenshot ;-)

Cause 12 : Cancelling managed recovery hangs.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

DGMGRL>EDIT DATABASE '<standby db_unique_name>' SET STATE='APPLY-OFF';

The above commands hang.

Solution 12 Let's shutdown abort the standby, startup mount, then have a clean shutdown, check if there are server processes left over for the standby instance.

- Let's shutdown abort the standby database.

SQL>shutdown abort;

SQL>startup mount;

- Then have a clean shutdown of the standby database.

SQL>shutdown immediate;

- Then check if there is any oracle process left over for the standby instance.

ps -ef |grep -i <standby instance_name>

- If so, then kill the left over process(es).

kill -9 <ospid>

- If not, then startup the standby and recovery.

SQL>startup mount;

SQL>recover automatic standby database;

After recovering the available archive log files on the standby, you could enter CANCEL at the prompt, and then start managed recovery.

If you use broker, then you issue the commands from dgmgrl. For example,

% dgmgrl /* from the standby server */

DGMGRL> startup mount;

DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='APPLY-ON';

EasyReliableDBA

Friday, 12 May 2023

How to resolve MRP stuck issues on a physical standby database

No comments:

Post a Comment

Search This Blog