Monday, 7 November 2022

Logs are not shipped to the physical standby database in oracle

 Logs are not shipped to the physical standby database in oracle 


May encounter different types of errors depending on the situations:


  • Logs are not shipped at log switch or fal archiving fails
  • Value of applied column on the primary is not consistent as that on the standby
  • RMAN does not delete logs and results in error RMAN-08137
  • In some rare cases ORA-12154 may show up in the primary alert log
  • Other possible errors ORA-3113, ORA-3135
  • WARN: ARCH: Terminating pid <PID> hung on an I/O operation krsv_proc_kill: Killing 1 processes (Process by index)


CHANGES

Problem usually occurs after OS or network errors, or restarting the primary or standby instance or reboot the primary or standby node that abruptly crashes log shipping between the primary and standby


CAUSE

ARCx processes on the primary stuck on the network forever or that are responsible to update the APPLIED column get stuck and can not recover themselves.

Additionally these processes that may be used for local and remote archiving, heartbeat and FAL fetching logs on the primary.

So when they are all stuck and reach the maximum number of values specified in log_archive_max_processes, they can cause ambiguous errors as shown above.


The worst case would be all ARCx processes on the primary are stuck and they couldn't do local archiving, so that all online redo log files are full which causes the primary database hangs.

This may be due to standby db crash, network errors or some abrupt outage on the standby or primary.

The other common cause is the firewall.


SOLUTION

ARCx processes on the primary need to be restarted.

Assuming that log transport from the primary is configured by log_archive_dest_2.

Please perform the following:


1) If the Data Guard Broker is running, disable Data Guard Broker on both primary and standby:

SQL> alter system set dg_broker_start=FALSE;


2) On the Primary Database:

- Set log transport state to DEFER status:

SQL> alter system set log_archive_dest_state_2='defer';

SQL> alter system switch logfile;

- Reset log_archive_dest_2

SQL> show parameter log_archive_dest_2   -------------------> Note this value

SQL> alter system set log_archive_dest_2 = '';

- Switch logfiles on the Primary

SQL> alter system switch logfile;


3) On the Standby Database:


- Cancel Managed Recovery

SQL> alter database recover managed standby database cancel;

- Shutdown the Standby Database

SQL> shutdown immediate


4) On the Primary: kill the ARCx Processes and the Database will respawn them automatically immediately without harming it.


ps -ef | grep -i arc

kill -9 <ospid of ARC process> <another ospid of ARC process> ...


5) On standby server, startup Standby Database and resume Managed Recovery


SQL> startup mount;

SQL> alter database recover managed standby database [using current logfile] disconnect;


6) Re-enable Log Transport Services on the Primary:

SQL>Alter system set log_archive_dest_2='Service=..............' ;        Set log_archive_dest_2 (Which was reset in step 2) to its original value

SQL> alter system set log_archive_dest_state_2='enable' ;


At this point all the ARCx processes should be up and running on the Primary.


7) Re-enable the Data Guard Broker for both, Primary and Standby if applicable:

SQL> alter system set dg_broker_start=true;

8) Please work with your Network Administrator to make sure the following Firewall Features are disabled.


  • SQLNet fixup protocol
  • Deep Packet Inspection (DPI)
  • SQLNet packet inspection
  • SQL Fixup
  • SQL ALG (Juniper firewall)

NOTE : To enable SQLNET tracing refer,


step by Step Method to Enable SQLNET(Server Tracing) Dynamically on Dataguard (Doc ID 2010476.1) 


The wording and features can vary by vendor but all the above have some impact on some packets (not all packets are affected).

Some firewalls can have an adverse effect on certain SQL packets transported across them (again, some not all).

In some rare cases if the above procedure doesn't work, you have to restart the primary instance to make the log shipping resuming again.


No comments:

Post a Comment