Friday 26 May 2023

Parameter check for Primary Database and Standby Database during switch over


NOTE: Running DGDiagnostic.sh on both primary and standby will run diag script (appropriate for that database), it will collect sqlnet files and run basic network configuration checks.


Run this on Primary Database


 REM srdc_DGPrimary_diag.sql - collect primary database information in dataguard environment

define SRDCNAME='DG_PRIM_DIAG'

set markup html on spool on


set TERMOUT off;

COLUMN SRDCSPOOLNAME NOPRINT NEW_VALUE SRDCSPOOLNAME

select 'SRDC_'||upper('&&SRDCNAME')||'_'||upper(value)||'_'||

      to_char(sysdate,'YYYYMMDD_HH24MISS') SRDCSPOOLNAME from v$parameter where lower(name)='instance_name';


REM

spool &&SRDCSPOOLNAME..htm

Set heading off;


select '+----------------------------------------------------+' from dual

union all

select '| Script version:  '||'25-Aug-2021' from dual

union all

select '| Diagnostic-Name: '||'&&SRDCNAME' from dual

union all

select '| Timestamp: '|| to_char(systimestamp,'YYYY-MM-DD HH24:MI:SS TZH:TZM') from dual

union all

select '| Machine: '||host_name from v$instance

union all

select '| Version: '||version from v$instance

union all

select '| DBName: '||name from v$database

union all

select '| Instance: '||instance_name from v$instance

union all

select '+----------------------------------------------------+' from dual

/


set heading on;

set echo off

set feedback off

column timecol new_value timestamp

column spool_extension new_value suffix

column output new_value dbname

set linesize 2000

set pagesize 50000

set numformat 999999999999999

set trim on

set trims on

set markup html on

set markup html entmap off

set feedback on


ALTER SESSION SET nls_date_format = 'DD-MON-YYYY HH24:MI:SS';

SELECT sysdate time FROM dual;

select decode(count(cell_path),0,'Non-Exadata','Exadata') "System" from v$cell;    


set echo on


-- In the following output the DATABASE_ROLE should be PRIMARY as that is what this script is intended to be run on.

-- PLATFORM_ID should match the PLATFORM_ID of the standby(s) or conform to the supported options in

-- Note: 413484.1 Data Guard Support for Heterogeneous Primary and Physical Standbys in Same Data Guard Configuration

-- Note: 1085687.1 Data Guard Support for Heterogeneous Primary and Logical Standbys in Same Data Guard Configuration

-- OPEN_MODE should be READ WRITE.

-- LOG_MODE should be ARCHIVELOG.

-- FLASHBACK can be YES (recommended) or NO.

-- If PROTECTION_LEVEL is different from PROTECTION_MODE then for some reason the mode listed in PROTECTION_MODE experienced a need to downgrade.

-- Once the error condition has been corrected the PROTECTION_LEVEL should match the PROTECTION_MODE after the next log switch;


SELECT database_role role, name, db_unique_name, checkpoint_change#, current_scn, resetlogs_change#,  platform_id, open_mode, log_mode, flashback_on, protection_mode, protection_level FROM v$database;

SELECT dbid, name, cdb from v$database;


-- Check is using Enterprise or Standard Edition

SELECT BANNER from v$version where BANNER like '%Oracle%';


-- FORCE_LOGGING is not mandatory but is recommended.

-- REMOTE_ARCHIVE should be ENABLE.

-- SUPPLEMENTAL_LOG_DATA_PK and SUPPLEMENTAL_LOG_DATA_UI must be enabled if the standby associated with this primary is a logical standby.

-- During normal operations it is acceptable for SWITCHOVER_STATUS to be SESSIONS ACTIVE or TO STANDBY.

-- DG_BROKER can be ENABLED (recommended) or DISABLED.;


column force_logging format a13 tru

column remote_archive format a14 tru

column supplemental_log_data_pk format a24 tru

column supplemental_log_data_ui format a24 tru

column dataguard_broker format a16 tru


SELECT force_logging, remote_archive, supplemental_log_data_pk, supplemental_log_data_ui, switchover_status, standby_became_primary_scn, dataguard_broker FROM v$database;


-- The following query gives us information about catpatch. From this we can tell if the catalog version doesn''t match the image version it was started with.


column version format a10 tru

SELECT version, modified, status FROM dba_registry WHERE comp_id = 'CATPROC';


-- Check how many threads are enabled and started for this database. If the number of instances below does not match then not all instances are up.


SELECT thread#, instance, status FROM v$thread;


-- The number of instances returned below is the number currently running.  If it does not match the number returned in Threads above then not all instances are up.

-- VERSION should match the version from CATPROC above.

-- ARCHIVER can be (STOPPED | STARTED | FAILED). FAILED means that the archiver failed to archive a log last time, but will try again within 5 minutes.

-- LOG_SWITCH_WAIT the ARCHIVE LOG/CLEAR LOG/CHECKPOINT event log switching is waiting for.

-- Note that if ALTER SYSTEM SWITCH LOGFILE is hung, but there is room in the current online redo log, then the value is NULL.


column host_name format a32 wrap

SELECT thread#, instance_name, host_name, version, archiver, log_switch_wait, startup_time FROM gv$instance ORDER BY thread#;


--  Password file Users

col USERNAME format a30

SELECT USERNAME,SYSDBA,SYSOPER,SYSASM,U.ACCOUNT_STATUS "ACCOUNT_STATUS",U.AUTHENTICATION_TYPE "AUTHENTICATION_TYPE" FROM V$PWFILE_USERS JOIN DBA_USERS U USING (USERNAME);


select file_name, format, is_asm, con_id from v$passwordfile_info;

-- Check incarnation of the database, the standby and primary must have the same CURRENT incarnation:

select incarnation#, resetlogs_change#, resetlogs_time, prior_resetlogs_change#, prior_resetlogs_time,

status, resetlogs_id, prior_incarnation#, flashback_database_allowed from gv$database_incarnation;


-- Number of log switches in less than 20min in last 24hours

 SELECT a.LOG_SWITCHES_UNDER_20_MINS "LOG_SWITCHES_UNDER_20_MINS", b.LOG_SWITCHES_OVER_20_MINS "LOG_SWITCHES_OVER_20_MINS" from

(select count(*) "LOG_SWITCHES_UNDER_20_MINS" from v$archived_log join v$archive_dest using (dest_id) where target='PRIMARY' AND destination IS NOT NULL and first_time > (sysdate -1) and (next_time-first_time) < 1/72) a ,

(select count(*) "LOG_SWITCHES_OVER_20_MINS" from v$archived_log join v$archive_dest using (dest_id) where target='PRIMARY' AND destination IS NOT NULL and first_time > (sysdate -1) and (next_time-first_time) > 1/72) b;

-- Details of time slot where number of log switches are more than 4 per hour in last 7 days

select Day||':00-59' "Day",LOG_SWITCHES_UNDER_20_MINS from ( with agg as( select  to_char(First_time,'MON DD,YYYY HH24') Day,count(*) "LOG_SWITCHES_UNDER_20_MINS",dense_rank() over (partition by to_char(First_time,'MON DD,YYYY HH24') order by 2 desc) as rank  from v$archived_log join v$archive_dest using (dest_id) where target='PRIMARY' AND destination IS NOT NULL

and (next_time-first_time) <= 1/72 and First_time > (sysdate-7)

group by to_char(First_time,'MON DD,YYYY HH24')

)

select Day,LOG_SWITCHES_UNDER_20_MINS from agg order by 1 desc ) where  LOG_SWITCHES_UNDER_20_MINS >4;


-- History: log switches details of last 7days

SELECT Day,LOG_SWITCHES_UNDER_20_MINS,LOG_SWITCHES_OVER_20_MINS from (

select  to_char(First_time,'MON DD,YYYY') Day,count(*) "LOG_SWITCHES_UNDER_20_MINS" from v$archived_log join v$archive_dest using (dest_id) where target='PRIMARY' AND destination IS NOT NULL and (next_time-first_time) <= 1/72 and First_time > (sysdate-7) group by to_char(First_time,'MON DD,YYYY')) a  join (select to_char(First_time,'MON DD,YYYY') Day,count(*) "LOG_SWITCHES_OVER_20_MINS" from v$archived_log join v$archive_dest using (dest_id) where target='PRIMARY' AND destination IS NOT NULL and (next_time-first_time) > 1/72 and First_time > (sysdate-7) group by to_char(First_time,'MON DD,YYYY')) b using (Day)

order by Day desc;


-- Check the max redo generation speed of the day

-- and average speed of top 30 redo speed and recommendation of ORL size to ensure switch happening on every 20min

select time "Day" ,round(max(value/1024),2) "Max_Redo_KB/s",round(avg(value)/1024,2) "Avg_Redo(top30_mins)_KB/s",round(avg(value)*60*20/1024,0) "Recommended_ORL_size_KB"

from ( with agg as (

select to_char(BEGIN_TIME,'DD/MON/YYYY') time,value, dense_rank() over (partition by to_char(BEGIN_TIME,'DD/MON/YYYY') order by value desc) as rank from dba_hist_sysmetric_history where metric_name = 'Redo Generated Per Sec' and to_char(BEGIN_TIME,'DD/MON/YYYY') > sysdate -7) select time,value from agg where agg.rank <31

) group by time order by 1 desc;


column  minutes  format a12

SELECT (CASE WHEN bucket = 1 THEN '<= ' || TO_CHAR(bucket * 5) WHEN (bucket >1 AND bucket < 9) THEN TO_CHAR(bucket * 5 - 4) || ' TO ' || TO_CHAR(bucket * 5) WHEN bucket > 8 THEN '>= ' || TO_CHAR(bucket * 5 - 4) END) "MINUTES", switches "LOG_SWITCHES" FROM (SELECT bucket , COUNT(b.bucket) SWITCHES FROM (SELECT WIDTH_BUCKET(ROUND((b.first_time - a.first_time) * 1440), 0, 40, 8) bucket FROM v$archived_log a, v$archived_log b WHERE a.sequence# + 1 = b.sequence# AND a.dest_id = b.dest_id  AND a.thread# = b.thread#  AND a.dest_id = (SELECT MIN(dest_id) FROM gv$archive_dest WHERE target = 'PRIMARY' AND destination IS NOT NULL)) b GROUP BY bucket ORDER BY bucket);


-- Check the number and size of online redo logs on each thread.


SELECT thread#, group#, sequence#, bytes, archived ,status, blocksize FROM v$log ORDER BY thread#, group#;


-- To get the online redo log file location 


SELECT thread#, group#, sequence#, bytes, archived ,l.status, blocksize, member FROM v$log l join v$logfile f using (GROUP#) ORDER BY thread#, group#;


-- The following query is run to see if standby redo logs have been created in preparation for switchover.

-- The standby redo logs should be the same size as the online redo logs.<br>There should be (( # of online logs per thread + 1) * # of threads) standby redo logs.

-- A value of 0 for the thread# means the log has never been allocated.


SELECT thread#, group#, sequence#, bytes, archived, status, blocksize FROM v$standby_log order by thread#, group#;


-- This query provides the number of standby redo logs, per instance


column count(*) new_value count

column CHECK_NAME format a40

select inst_id, dest_id, standby_logfile_count from gv$archive_dest_status where status <> 'INACTIVE' and db_unique_name <> 'NONE' order by dest_id, inst_id;


-- This query produces a list of defined archive destinations. It shows if they are enabled, what process is servicing that destination,

-- if the destination is local or remote.


column destination format a35 wrap

column process format a7

column ID format 99

column mid format 99

SELECT thread#, dest_id, destination, target, schedule, process FROM gv$archive_dest gvad, gv$instance gvi WHERE gvad.inst_id = gvi.inst_id AND destination is NOT NULL ORDER BY thread#, dest_id;


-- This select will give further detail on the destinations as to what options have been set.

-- Register indicates whether or not the archived redo log is registered in the remote destination control fileOptions.


set numwidth 8

column archiver format a8

column affirm format a6

column error format a55 wrap

column register format a8


SELECT thread#, dest_id, gvad.archiver, transmit_mode, affirm, async_blocks, net_timeout, max_failure, delay_mins, reopen_secs reopen, register, binding FROM gv$archive_dest gvad, gv$instance gvi WHERE gvad.inst_id = gvi.inst_id AND destination is NOT NULL ORDER BY thread#, dest_id;


-- This shows if redo transport is configured

col name format a20

col value format a2000

column CHECK_NAME format a40

select name, inst_id, value from gv$parameter where name like 'log_archive_dest_%' and upper(value) like '%SERVICE%' order by inst_id;


-- The following select will show any errors that occured the last time an attempt to archive to the destination was attempted.

-- If ERROR is blank and status is VALID then the archive completed correctly.

SELECT thread#, dest_id, gvad.status, error, fail_sequence FROM gv$archive_dest gvad, gv$instance gvi WHERE gvad.inst_id = gvi.inst_id AND destination is NOT NULL ORDER BY thread#, dest_id;


-- The query below will determine if any error conditions have been reached by querying the v$dataguard_status view (view only available in 9.2.0 and above).


column message format a80

SELECT instance, thread#, name, pid, action, client_role, client_pid, sequence#, block#, dest_id

from v$dataguard_process order by thread#, pid;


SELECT gvi.thread#, timestamp, message FROM gv$dataguard_status gvds, gv$instance gvi WHERE gvds.inst_id = gvi.inst_id AND severity in ('Error','Fatal') ORDER BY timestamp, thread#;


-- Query v$managed_standby to see the status of processes involved in the shipping redo on this system.

-- Does not include processes needed to apply redo.


SELECT inst_id, thread#, process, pid, status, client_process, client_pid, sequence#, block#, active_agents, known_agents FROM gv$managed_standby ORDER BY thread#, pid;


-- The following query will determine the current sequence number and the last sequence archived.

-- If you are remotely archiving using the LGWR process then the archived sequence should be one higher than the current sequence.

-- If remotely archiving using the ARCH process then the archived sequence should be equal to the current sequence.

-- The applied sequence information is updated at log switch time.

-- The "Last Applied" value should be checked with the actual last log applied at the standby, only the standby is guaranteed to be correct.


SELECT cu.thread#, cu.dest_id, la.lastarchived "Last Archived", cu.currentsequence "Current Sequence", appl.lastapplied "Last Applied" FROM (select gvi.thread#, gvd.dest_id, MAX(gvd.log_sequence) currentsequence FROM gv$archive_dest gvd, gv$instance gvi WHERE gvd.status = 'VALID' AND gvi.inst_id = gvd.inst_id GROUP BY thread#, dest_id) cu, (SELECT thread#, dest_id, MAX(sequence#) lastarchived FROM gv$archived_log WHERE resetlogs_change# = (SELECT resetlogs_change# FROM v$database) AND archived = 'YES' GROUP BY thread#, dest_id) la, (SELECT thread#, dest_id, MAX(sequence#) lastapplied FROM gv$archived_log WHERE resetlogs_change# = (SELECT resetlogs_change# FROM v$database) AND applied = 'YES' GROUP BY thread#, dest_id) appl WHERE cu.thread# = la.thread# AND cu.thread# = appl.thread# AND cu.dest_id = la.dest_id AND cu.dest_id = appl.dest_id ORDER BY 1, 2;


-- The following select will attempt to gather as much information as possible from the standby.

-- Standby redo logs are not supported with Logical Standby until Version 10.1.

-- The ARCHIVED_SEQUENCE# from a logical standby is the sequence# created by the apply, not the sequence# sent from the primary.


set numwidth 8

column dest_id format 99

column Active format 99


SELECT dest_id, database_mode, recovery_mode, protection_mode, standby_logfile_count, standby_logfile_active FROM v$archive_dest_status WHERE destination IS NOT NULL;


-- Non-default init parameters. For a RAC DB Thread# = * means the value is the same for all threads (SID=*)

-- Threads with different values are shown with their individual thread# and values.


column num noprint

SELECT num, '*' "THREAD#", name, value FROM v$PARAMETER WHERE NUM IN (SELECT num FROM v$parameter WHERE (isdefault = 'FALSE' OR ismodified <> 'FALSE') AND name NOT LIKE 'nls%'

MINUS

SELECT num FROM gv$parameter gvp, gv$instance gvi WHERE num IN (SELECT DISTINCT gvpa.num FROM gv$parameter gvpa, gv$parameter gvpb WHERE gvpa.num = gvpb.num AND  gvpa.value <> gvpb.value AND (gvpa.isdefault = 'FALSE' OR gvpa.ismodified <> 'FALSE') AND gvpa.name NOT LIKE 'nls%') AND gvi.inst_id = gvp.inst_id  AND (gvp.isdefault = 'FALSE' OR gvp.ismodified <> 'FALSE') AND gvp.name NOT LIKE 'nls%')

UNION

SELECT num, TO_CHAR(thread#) "THREAD#", name, value FROM gv$parameter gvp, gv$instance gvi WHERE num IN (SELECT DISTINCT gvpa.num FROM gv$parameter gvpa, gv$parameter gvpb WHERE gvpa.num = gvpb.num AND gvpa.value <> gvpb.value AND (gvpa.isdefault = 'FALSE' OR gvpa.ismodified <> 'FALSE') AND gvp.name NOT LIKE 'nls%') AND gvi.inst_id = gvp.inst_id  AND (gvp.isdefault = 'FALSE' OR gvp.ismodified <> 'FALSE') AND gvp.name NOT LIKE 'nls%' ORDER BY 1, 2;


-- LAST APPLIED ARCHIVELOG FILES

--

select dest_id,thread#,sequence# "sequence#",first_time,first_change#,next_time,next_change# from v$archived_log

where (dest_id,thread#,sequence#) in

(select dest_id,thread#,max(sequence#) from v$archived_log where dest_id in

(select to_number(ltrim(name,'log_archive_dest_')) from v$parameter where lower(name) like 'log_archive_dest%' and upper(value) like '%SERVICE=%'

minus

select dest_id from v$archive_dest where VALID_ROLE  in ('STANDBY_ROLE')) and applied='YES' group by dest_id,thread#)

and activation# in (select activation# from v$database)

and 'PRIMARY' =(select database_role from v$database)

union

select dest_id,thread#,sequence# "sequence#",first_time,first_change#,next_time,next_change# from v$archived_log

where (dest_id,thread#,sequence#) in

(select dest_id,thread#,max(sequence#) from v$archived_log where dest_id in (select dest_id from v$archive_dest where destination is not null and upper(VALID_ROLE) in ('ALL_ROLES','STANDBY_ROLE') and dest_id <> 32) and applied='YES' group by dest_id,thread#)

and activation# in (select activation# from v$database)

and 'PHYSICAL STANDBY' =(select database_role from v$database)

;


-- ARCHIVELOG APPLY RANGE

--

select b.dest_id,a.thread#,a.sequence# "latest archivelog" ,b.sequence# "latest archive applied",a.sequence# - b.sequence# "# Not Applied" from

(select dest_id,thread#,max(sequence#) sequence# from v$archived_log

where dest_id in (select to_number(ltrim(name,'log_archive_dest_')) from v$parameter where lower(name) like 'log_archive_dest%' and upper(value) like '%SERVICE=%'

minus

select dest_id from v$archive_dest where VALID_ROLE  in ('STANDBY_ROLE'))

 and applied='YES' group by dest_id,thread#) b,

(select thread#,max(sequence#) sequence#  from v$archived_log where activation# in (select activation# from v$database) group by thread#) a

where a.thread#=b.thread#

and 'PRIMARY' =(select database_role from v$database)

union

select b.dest_id,a.thread#,a.sequence# "latest archivelog" ,b.sequence# "latest archive applied",a.sequence# - b.sequence# "# Not Applied" from

(select dest_id,thread#,max(sequence#) sequence# from v$archived_log where applied='YES' and  dest_id in ( select dest_id from v$archive_dest where destination is not null and upper(VALID_ROLE) in ('ALL_ROLES','STANDBY_ROLE') and dest_id <> 32 ) group by dest_id,thread#) b,

(select thread#,max(sequence#) sequence#  from v$archived_log where activation# in (select activation# from v$database) group by thread#) a

where a.thread#=b.thread#

and 'PHYSICAL STANDBY' =(select database_role from v$database);


-- DATABASE INFORMATION

--

select file#,tablespace_name tablespace,name datafile,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time,bytes from v$datafile_header order by file#;


select status,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time, count(*), fuzzy from v$datafile_header

group by status,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time, fuzzy order by checkpoint_change# desc;


select min(fhrba_Seq), max(fhrba_Seq) from X$KCVFH;


-- CONTAINER DATABASE INFORMATION

--

show pdbs;

select con_id,name,open_mode,restricted from v$containers;


select 'ROOT',file#,tablespace_name tablespace,name datafile,checkpoint_change#,checkpoint_time, 

resetlogs_change#, resetlogs_time,bytes, con_id from v$datafile_header h 

where h.con_id=1

union

select p.name, file#,tablespace_name tablespace,h.name datafile,checkpoint_change#,checkpoint_time, 

resetlogs_change#, resetlogs_time,bytes,h.con_id

from v$datafile_header h, v$pdbs p 

where  h.con_id=p.con_id 

order by con_id, file#;


select 'ROOT',status,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time, count(*), fuzzy,h.con_id from v$datafile_header h

where h.con_id=1

group by status,checkpoint_change#,checkpoint_time, resetlogs_change#,

resetlogs_time, fuzzy,h.con_id

UNION

select p.name,status,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time, count(*), fuzzy,h.con_id from v$datafile_header h, v$pdbs p

where  h.con_id=p.con_id

group by p.name, status,checkpoint_change#,checkpoint_time, resetlogs_change#,

resetlogs_time, fuzzy,h.con_id

order by con_id;


-- Get statistics of Redo generation per hour of last 3 days in descending order


select Day||':00-59' "Day" ,round(max(value/1024),2) "Max_Redo_generation_KB/s",round(avg(value/1024),2) "Aveg_redo_in_hour_KB/s" from(

with agg as (

select to_char(BEGIN_TIME,'DD/MON/YYYY HH24') Day,value, dense_rank() over (partition by to_char(BEGIN_TIME,'DD/MON/YYYY HH24') order by value desc) as rank

from dba_hist_sysmetric_history where metric_name = 'Redo Generated Per Sec' and to_char(BEGIN_TIME,'DD/MON/YYYY') > sysdate -3 )

select Day,value from agg)

group by Day order by 1 desc;


-- Flashback Database and Restore Point Details 


select scn,database_incarnation#,guarantee_flashback_database,storage_size,

time,restore_point_time,preserved,name from v$restore_point;


select * from v$restore_point;

select min(first_change#),min(first_time)  from v$flashback_database_logfile;

select oldest_flashback_scn, oldest_flashback_time, retention_target from v$flashback_database_log;


-- GV$ARCHIVE_DEST full details


select * from gv$archive_dest;


-- Controlfile Details


show parameter control_file_record_keep;

select * from v$controlfile_record_section;


spool off

set markup html off entmap on

set echo on

exit


Run this on Standby Database


REM srdc_DGPhyStby_diag.sql - collect primary database information in dataguard environment

define SRDCNAME='DG_PHYSTBY_DIAG'

set markup html on spool on


set TERMOUT off;


COLUMN SRDCSPOOLNAME NOPRINT NEW_VALUE SRDCSPOOLNAME 

select 'SRDC_'||upper('&&SRDCNAME')||'_'||upper(value)||'_'||

      to_char(sysdate,'YYYYMMDD_HH24MISS') SRDCSPOOLNAME from v$parameter where lower(name)='instance_name';


REM 

spool &&SRDCSPOOLNAME..htm

Set heading off;


select '+----------------------------------------------------+' from dual 

union all 

select '| Script version:  '||'25-Aug-2021' from dual

union all

select '| Diagnostic-Name: '||'&&SRDCNAME' from dual 

union all 

select '| Timestamp: '|| to_char(systimestamp,'YYYY-MM-DD HH24:MI:SS TZH:TZM') from dual 

union all 

select '| Machine: '||host_name from v$instance 

union all 

select '| Version: '||version from v$instance 

union all 

select '| DBName: '||name from v$database 

union all 

select '| Instance: '||instance_name from v$instance 

union all 

select '+----------------------------------------------------+' from dual

/

 

Set heading on;

set linesize 2000

set pagesize 50000

set numformat 999999999999999

set trim on

set trims on

set markup html on

set markup html entmap off

set feedback on

set echo on

ALTER SESSION SET nls_date_format = 'DD-MON-YYYY HH24:MI:SS';

select sysdate from dual;


-- EXADATA

select decode(count(cell_path),0,'Non-Exadata','Exadata') "System" from v$cell;


-- MIRA

select decode (count(distinct inst_id),0,'NO_MRP',1,'SIRA','MIRA') "APPLYMODE" from gv$process where upper(pname) like 'PR%';

   

-- The following select will give us the generic information about how this standby is setup.

-- The DATABASE_ROLE should be STANDBY as that is what this script is intended to be run on.

-- PLATFORM_ID should match the PLATFORM_ID of the primary or conform to the supported options in

-- Note: 413484.1 Data Guard Support for Heterogeneous Primary and Physical Standbys in Same Data Guard Configuration.

-- FLASHBACK can be YES (recommended) or NO.

-- If PROTECTION_LEVEL is different from PROTECTION_MODE then for some reason the mode listed in PROTECTION_MODE experienced a need to downgrade.

-- Once the error condition has been corrected the PROTECTION_LEVEL should match the PROTECTION_MODE after the next log switch.


SELECT database_role role, name, db_unique_name, checkpoint_change#, current_scn, resetlogs_change#,  

platform_id, open_mode, log_mode, flashback_on, protection_mode, protection_level FROM v$database;

Select dbid, name, cdb container from v$database;


-- FORCE_LOGGING is not mandatory but is recommended.

-- REMOTE_ARCHIVE should be ENABLE.

-- SUPPLEMENTAL_LOG_DATA_PK and SUPPLEMENTAL_LOG_DATA_UI must be enabled if this standby is associated with a primary that has a logical standby.

-- During normal operations it is acceptable for SWITCHOVER_STATUS to be NOT ALLOWED.

-- DATAGUARD_BROKER can be ENABLED (recommended) or DISABLED.

 

column force_logging format a13 tru

column supplemental_log_data_pk format a24 tru

column supplemental_log_data_ui format a24 tru


SELECT force_logging, remote_archive, supplemental_log_data_pk, supplemental_log_data_ui, switchover_status, standby_became_primary_scn, dataguard_broker FROM v$database;


-- Check how many threads are enabled and started for this database. If the number of instances below does not match, then not all instances are up.

-- VERSION should match the version from the primary database.

-- ARCHIVER can be (STOPPED | STARTED | FAILED). FAILED means that the archiver failed to archive a log last time, but will try again within 5 minutes.

-- LOG_SWITCH_WAIT the ARCHIVE LOG/CLEAR LOG/CHECKPOINT event log switching is waiting for.

-- Note that if ALTER SYSTEM SWITCH LOGFILE is hung, but there is room in the current online redo log, then the value is NULL.


column host_name format a32 wrap

SELECT thread#, instance_name, host_name, version, status, archiver, log_switch_wait, startup_time FROM gv$instance ORDER BY thread#;


--  Password file Users


col USERNAME format a30

SELECT USERNAME,SYSDBA,SYSOPER,SYSASM FROM V$PWFILE_USERS;

select file_name, format, is_asm, con_id from v$passwordfile_info;


-- Check incarnation of the database, the standby and primary must have the same CURRENT incarnation:


select incarnation#, resetlogs_change#, resetlogs_time, prior_resetlogs_change#, prior_resetlogs_time, 

status, resetlogs_id, prior_incarnation#, flashback_database_allowed from gv$database_incarnation;


-- Check the number and size of online redo logs on each thread.


SELECT thread#, group#, sequence#, bytes, archived ,l.status, blocksize, member FROM v$log l join v$logfile f using (GROUP#) ORDER BY thread#, group#;


-- The following query is run to see if standby redo logs have been created.

-- The standby redo logs should be the same size as the online redo logs.

-- There should be (( # of online logs per thread + 1) * # of threads) standby redo logs.

-- A value of 0 for the thread# means the log has never been allocated.


select thread#, group#, sequence#, bytes, archived, s.status , blocksize, member FROM v$standby_log s join v$logfile using (GROUP#) order by thread#, group#;


-- This query produces a list of defined archive destinations.

-- It shows if they are enabled, what process is servicing that destination, if the destination is local or remote, and if remote what the current mount ID is.

-- For a physical standby we should have at least one remote destination that points the primary set.

 

column destination format a35 wrap

column process format a7

column ID format 99

column mid format 99

 

select thread#,dest_id,dest_name,destination,target,gvi.status instance_status,gvad.status archive_status,process, mountid,valid_now,valid_type,valid_role,db_unique_name,error 

from gv$archive_dest gvad join gv$instance gvi using (inst_id) where gvad.status='VALID' and destination is NOT NULL order by thread#, dest_id;

 

-- If the protection mode of the standby is set to anything higher than max performance then we need to make sure the remote destination that points to the primary is set with the correct options else we will have issues during switchover.

 

set numwidth 8

column archiver format a8

column ID format 99

column error format a55 wrap


SELECT thread#, dest_id, gvad.archiver, transmit_mode, affirm, async_blocks, net_timeout, delay_mins, reopen_secs reopen, register, binding FROM gv$archive_dest gvad, gv$instance gvi WHERE gvad.inst_id = gvi.inst_id AND destination is NOT NULL ORDER BY thread#, dest_id;

 

-- The query below will determine if any error conditions have been reached by querying the v$dataguard_status view (view only available in 9.2.0 and above).

 

column message format a80

select inst_id, thread#, facility, severity, dest_id, error_code, timestamp,message 

from gv$dataguard_status join gv$instance using (inst_id) 

where severity not in ('Informational') and message not like 'No SRLs created%' order by TIMESTAMP desc;


-- Query gv$managed_standby to see the status of processes involved in the shipping redo on this system.

-- Does not include processes needed to apply redo.

 

SELECT thread#, process, pid, status, client_process, client_pid, sequence#, block#, active_agents, known_agents FROM gv$managed_standby ORDER BY thread#, process;


select inst_id,thread#,sequence#,status from gv$managed_standby where process='MRP0';

SELECT instance, thread#, name, pid, action, client_role, client_pid, sequence#, block#, dest_id

from v$dataguard_process order by thread#, pid;


-- query gv$recovery_progress

select to_char(start_time,'dd-mon-yyyy HH:MI:SS') start_time, type, item, units, sofar, total, to_char(timestamp,'dd-mon-yyyy HH:MI:SS') timestamp from gv$recovery_progress where item like '%Apply Rate';



-- Verify the last sequence# received and the last sequence# applied to standby database.

SELECT al.thrd "Thread", almax "Last Seq Received", lhmax "Last Seq Applied" FROM (select thread# thrd, MAX(sequence#) almax FROM v$archived_log WHERE resetlogs_change#=(SELECT resetlogs_change# FROM v$database) GROUP BY thread#) al, (SELECT thread# thrd, MAX(sequence#) lhmax FROM v$log_history WHERE resetlogs_change#=(SELECT resetlogs_change# FROM v$database) GROUP BY thread#) lh WHERE al.thrd = lh.thrd;


-- Check the transport lag and apply lag from the V$DATAGUARD_STATS view.  This is only relevant when LGWR log transport and real time apply are in use.


SELECT * FROM v$dataguard_stats WHERE name LIKE '%lag%';


-- Check how often and how far the apply lags.


SELECT name, time, unit, count, TO_DATE(last_time_updated, 'MM/DD/YYYY HH24:MI:SS') FROM v$standby_event_histogram ORDER BY unit DESC, time;


-- The V$ARCHIVE_GAP fixed view on a physical standby database only returns the next gap that is currently blocking redo apply from continuing.

-- After resolving the identified gap and starting redo apply, query the V$ARCHIVE_GAP fixed view again on the physical standby database to determine the next gap sequence, if there is one.


SELECT * FROM v$archive_gap;


-- Check RFS PERF STATE

select event, total_waits, total_timeouts, time_waited, average_wait*10 from v$system_event where event like '%RFS%' order by 5 desc;   

  

-- Check datfile perf state

select event, total_waits, total_timeouts, time_waited, average_wait*10 from v$system_event where event like '%db file%' order by 5 desc;   

   

-- order by time_waited and exclude the idle events 

   

column event format a40 truncate

column name format a35 truncate

column opname format a35 truncate

column value format 99999999999999 

select event, total_waits, time_waited, average_wait*10 from v$system_event where time_waited > 100 and event not like 'rdbms ipc %' and event not like '%timer%'and lower(event) not like '%idle%' and lower(event) not like 'sql%net%' and event not like 'ges%' order by time_waited;


-- Non-default init parameters.

-- For a RAC DB Thread# = * means the value is the same for all threads (SID=*)

-- Threads with different values are shown with their individual thread# and values.

column num noprint

SELECT num, '*' "THREAD#", name, value FROM v$PARAMETER WHERE NUM IN (SELECT num FROM v$parameter WHERE (isdefault = 'FALSE' OR ismodified <> 'FALSE') AND name NOT LIKE 'nls%'

MINUS

SELECT num FROM gv$parameter gvp, gv$instance gvi WHERE num IN (SELECT DISTINCT gvpa.num FROM gv$parameter gvpa, gv$parameter gvpb WHERE gvpa.num = gvpb.num AND  gvpa.value <> gvpb.value AND (gvpa.isdefault = 'FALSE' OR gvpa.ismodified <> 'FALSE') AND gvpa.name NOT LIKE 'nls%') AND gvi.inst_id = gvp.inst_id  AND (gvp.isdefault = 'FALSE' OR gvp.ismodified <> 'FALSE') AND gvp.name NOT LIKE 'nls%')

UNION

SELECT num, TO_CHAR(thread#) "THREAD#", name, value FROM gv$parameter gvp, gv$instance gvi WHERE num IN (SELECT DISTINCT gvpa.num 

FROM gv$parameter gvpa, gv$parameter gvpb 

WHERE gvpa.num = gvpb.num AND gvpa.value <> gvpb.value

 AND (gvpa.isdefault = 'FALSE' OR gvpa.ismodified <> 'FALSE')

 AND gvp.name NOT LIKE 'nls%') AND gvi.inst_id = gvp.inst_id

 AND (gvp.isdefault = 'FALSE' OR gvp.ismodified <> 'FALSE') 

 AND gvp.name NOT LIKE 'nls%' ORDER BY 1, 2;


-- STANDBY LAST RECEIVED and LAST APPLIED


SELECT al.thrd "Thread", almax "Last Seq Received", lhmax "Last Seq Applied" 

FROM (select thread# thrd, MAX(sequence#) almax FROM v$archived_log WHERE 

resetlogs_change#=(SELECT resetlogs_change# FROM v$database) GROUP BY thread#) al, 

(SELECT thread# thrd, MAX(sequence#) lhmax FROM v$log_history WHERE 

resetlogs_change#=(SELECT resetlogs_change# 

FROM v$database) GROUP BY thread#) lh WHERE al.thrd = lh.thrd

and 'PHYSICAL STANDBY' =(select database_role from v$database); 


-- DATABASE INFORMATION

-- Check the state of the datafiles

-- Note: datafiles must be ONLINE 

-- Standby datafiles should never be *online* fuzzy.

select file#,tablespace_name tablespace,name datafile,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time,bytes from v$datafile_header order by file#;

select status,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time, count(*), fuzzy from v$datafile_header

group by status,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time, fuzzy order by checkpoint_change# desc;

select min(fhrba_Seq) "Min Sequence", max(fhrba_Seq) "Max Sequence",min(fhscn) "Min SCN",max(fhscn) "Max SCN" from X$KCVFH;


-- CONTAINER DATABASE INFORMATION

--

show pdbs;

select con_id,name,open_mode,restricted from v$containers;


select 'ROOT',file#,tablespace_name tablespace,name datafile,checkpoint_change#,checkpoint_time, 

resetlogs_change#, resetlogs_time,bytes, con_id from v$datafile_header h 

where h.con_id=1

union

select p.name, file#,tablespace_name tablespace,h.name datafile,checkpoint_change#,checkpoint_time, 

resetlogs_change#, resetlogs_time,bytes,h.con_id

from v$datafile_header h, v$pdbs p 

where  h.con_id=p.con_id 

order by con_id, file#;


select 'ROOT',status,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time, count(*), fuzzy,h.con_id from v$datafile_header h

where h.con_id=1

group by status,checkpoint_change#,checkpoint_time, resetlogs_change#,

resetlogs_time, fuzzy,h.con_id

UNION

select p.name,status,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time, count(*), fuzzy,h.con_id from v$datafile_header h, v$pdbs p where  h.con_id=p.con_id

group by p.name, status,checkpoint_change#,checkpoint_time, resetlogs_change#,

resetlogs_time, fuzzy,h.con_id

order by con_id;


-- Flashback Database and Restore Point Details 

select scn,database_incarnation#,guarantee_flashback_database,storage_size,

time,restore_point_time,preserved,name from v$restore_point;

select * from v$restore_point;

select min(first_change#),min(first_time)  from v$flashback_database_logfile;

select oldest_flashback_scn, oldest_flashback_time, retention_target from v$flashback_database_log;


-- GV$ARCHIVE_DEST full details

select * from gv$archive_dest;


-- Controlfile Details

show parameter control_file_record_keep;

select * from v$controlfile_record_section;


spool off

set markup html off entmap on

exit



Monday 22 May 2023

How to quickly check that Database is consistent after incomplete recovery (Point in Time Recovery) before OPEN RESETLOGS

 What is the LEAST amount of recovery that has to be done before a database restored from a backup can be opened?


After performing a RESTORE / RECOVER, we need to perform a quick validation to ensure database is consistent and ready for OPEN RESETLOGS.

This proactive check helps to prevent several issues which may appear during or after OPEN RESETLOGS.


Through out this note we assume that you are restoring from a valid backup.

There can be more scenarios than discussed here. Please consult Oracle Support when in doubt.


SOLUTION

 

NOTE: In the images and/or the document content below, the user information and environment data used represents fictitious data from the Oracle sample schema(s), Public Documentation delivered with an Oracle database product or other training material. Any similarity to actual environments, actual persons, living or dead, is purely coincidental and not intended in any manner.

For cold/offline backups, no archivelogs/recovery is necessary. You can simply open the database with resetlogs.


However for HOT/ONLINE backups, ALL archivelogs from backup start to backup end must be applied before the database can be opened - this is the MINIMUM amount of recovery needed.

To determine which log was current at the time the backup completed, note the COMPLETION time of the database backup - take this from the backup log.

If this is an RMAN backup you can also query the RMAN metadata. Ensure that the environment variable NLS_DATE_FORMAT is set before invoking rman so that timestamps as well as date are returned:


For unix:    

% export NLS_DATE_FORMAT='dd-mon-rr hh24:mi:ss'

% rman target /


For windows: 

> set nls_date_format=dd-mon-rr:hh24:mi:ss               

> rman target /


To find your backup:

RMAN> LIST BACKUP OF DATABASE COMPLETED AFTER '<date>';

 or

RMAN> LIST BACKUP OF DATABASE COMPLETED AFTER 'sysdate-n';

Set <date> to limit the output to the backups that you are interested and note the completion time - for a multi-piece backup, note the completion time of the LAST backuppiece created.

Through out this note, when running SQL queries you should set NLS_DATE_FORMAT at the session level as follows:


SQL> alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS' ;

Check 1: Checkpoint Time and Fuzziness

Objective: Verify that the datafiles are recovered to the intended point in time (PIT) and they are consistent (FUZZY=NO)


Query the current status and PIT (Point In Time up to which the datafiles have been recovered) of datafiles by reading datafile headers directly from the physical datafiles:


SQL> select fuzzy, status, error, recover, checkpoint_change#, checkpoint_time, count(*) from v$datafile_header group by fuzzy, status, error, recover, checkpoint_change#, checkpoint_time ;


FUZ STATUS  ERROR           REC CHECKPOINT_CHANGE# CHECKPOINT_TIME        COUNT(*)

--- ------- --------------- --- ------------------ -------------------- ----------

NO  ONLINE                                 5311260 31-AUG-2011 23:10:14          6

YES ONLINE                                 5311260 31-AUG-2011 23:10:14          1


a) Verify that the checkpoint_time / checkpoint_change# is in line with your intended UNTIL TIME/SCN. If not, recover the database further if you have more archived logs available.


b) If FUZZY=YES for some datafiles, it means more recovery is required. If these archived logs are lost, identify such datafiles and determine if we can take them offline. Warning: we will lose data in those datafiles if they are taken offline!


If the datafiles belong to SYSTEM or UNDO tablespace, we MUST not bring such datafile offline without proper analysis. Please consult Oracle Support for further actions.


SQL> select file#, substr(name, 1, 50), substr(tablespace_name, 1, 15), undo_opt_current_change# from v$datafile_header where fuzzy='YES' ;


     FILE# SUBSTR(NAME,1,50)                                  SUBSTR(TABLESPA UNDO_OPT_CURRENT_CHANGE#

---------- -------------------------------------------------- --------------- ------------------------

         3 /<path>/undotbs01.dbf                                   UNDOTBS1                  5117431


Occasionally, if the tablespace name doesn't indicate it to be UNDO tablespace, if we see non-zero value in column UNDO_OPT_CURRENT_CHANGE#, it indicates that the datafile contains undo segments.


To bring a datafile offline :


SQL> alter database datafile <file#> offline ;


Check 1 can be considered PASSED when :


a) Verified that all the datafiles are at the some checkpoint_time, and this is your intended Point in time.

b) Fuzzy=NO for SYSTEM, UNDO and all intended datafiles. For datafiles with Fuzzy=YES, either recover them further or bring them OFFLINE if no further archived logs are available.


Check 2: Datafile Status

Objective: Verify that the files with status=RECOVER are not OFFLINE unintentionally


SQL> select status, enabled, count(*) from v$datafile group by status, enabled ;


STATUS  ENABLED      COUNT(*)

------- ---------- ----------

SYSTEM  DISABLED            1

ONLINE  READ WRITE          4

RECOVER DISABLED            2


If the files are in RECOVER status, verify if they are OFFLINE :

SQL> select file#, substr(name, 1, 50), status, error, recover from v$datafile_header ;


If you want the data for these files to be accessible, then bring them ONLINE :


SQL> alter database datafile <file#> ONLINE ;


Check 2 can be considered Passed when:


All the intended datafiles are not OFFLINE 


Check 3: Absolute Fuzzy

Objective: Additional Fuzzy check (Absolute Fuzzy check)


Occasionally, it is possible to see Fuzzy=NO and same checkpoint_change# for all the intended datafiles but OPEN RESETLOGS still fails


eg:


SQL> select fuzzy, status, error, recover, checkpoint_change#, checkpoint_time, count(*) from v$datafile_header group by fuzzy, status, error, recover, checkpoint_change#, checkpoint_time ;


FUZ STATUS  ERROR           REC CHECKPOINT_CHANGE#      CHECKPOINT_TIME   COUNT(*)

--- ------- --------------- --- ------------------ -------------------- ----------

NO  ONLINE                                 5311260 31-AUG-2011 23:10:14          7



SQL> ALTER DATABASE OPEN RESETLOGS ;


ORA-01194: file 4 needs more recovery to be consistent

ORA-01110: data file 3: '/<path>/undotbs02.dbf'


Hence, we should perform additional fuzzy check known as Absolute Fuzzy Check:

SQL> select hxfil file#, substr(hxfnm, 1, 50) name, fhscn checkpoint_change#, fhafs Absolute_Fuzzy_SCN, max(fhafs) over () Min_PIT_SCN from x$kcvfh where fhafs!=0 ;


FILE#      NAME                                               CHECKPOINT_CHANG ABSOLUTE_FUZZY_S     MIN_PIT_SCN

---------- -------------------------------------------------- ---------------- ---------------- ----------------

         4 /<path>/undotbs01.dbf                                  5311260          5311524          5311524

         6 /<path>/system01.dbf                                   5311260          5311379          5311524

 


Note: Column Min_PIT_SCN will return same value even for multiple rows as we have applied ANALYTICAL "MAX() OVER ()" function on it.


Above query indicates that the recovery must be performed at least UNTIL SCN 5311524 to make datafiles consistent and ready to OPEN. Since the checkpoint_change# is smaller than Min_PIT_SCN, the datafiles will ask for more recovery.


Check 3 can be considered PASSED when:


a) No rows selected from above query (i.e. Min_PIT_SCN is 0 (Zero) for all the datafiles)

b) Min_PIT_SCN is returned less than Checkpoint_Change#


Check 4: Archive Logs Required


Query the controlfile to find the latest archivelog required fore recovery. Lets say the backup completed at  31-AUG-2011 23:20:14:


SQL> -- V$ARCHIVED_LOG

SQL> --

SQL> ALTER SESSION SET NLS_DATE_FORMAT='DD-MON-RR HH24:MI:SS';

SQL> SELECT THREAD#, SEQUENCE#, FIRST_TIME, NEXT_TIME FROM V$ARCHIVED_LOG      WHERE '31-AUG-11 23:20:14' BETWEEN FIRST_TIME AND NEXT_TIME;

If the above query does not return any rows, it may be that the information has aged out of the controlfile - run the following query against v$log_history.


SQL> -- V$LOG_HISTORY  view does not have a column NEXT_TIME

SQL> --

SQL> ALTER SESSION SET NLS_DATE_FORMAT='DD-MON-RR HH24:MI:SS';

SQL> select a.THREAD#, a.SEQUENCE#, a.FIRST_TIME

       from V$LOG_HISTORY a

      where FIRST_TIME =

         ( SELECT MAX(b.FIRST_TIME)

             FROM V$LOG_HISTORY b

            WHERE b.FIRST_TIME < to_date('31-AUG-11 23:20:14', 'DD-MON-RR HH24:MI:SS')

         ) ;

SQL>


The sequence# returned by the above query is the log sequence current at the time the backup ended - let say 530 thread 1. 


For minimum recovery use: (Sequence# as returned +1 )


RMAN> RUN

{

 SET UNTIL SEQUENCE 531 THREAD 1;

 RECOVER DATABASE;

}

 

If this is a RAC implementation the use this SQL instead to query the controlfile:


SQL> SELECT THREAD#, SEQUENCE#, FIRST_CHANGE#, NEXT_CHANGE# FROM V$ARCHIVED_LOG WHERE '31-AUG-11 23:20:14' BETWEEN FIRST_TIME AND NEXT_TIME;


For minimum recovery use the log sequence and thread that has the lowest NEXT_CHANGE# returned by the above query.


Check 4 can be considered PASSED when:


All archivelogs from the time of the backup to the end of the backup is available for use during recovery


Check 5: Post OPEN RESETLOGS Activities

During OPEN RESETLOGS, monitor the alert.log for additonal errors/messages. You might see some messages like below during dictionary check: 


Dictionary check beginning

Tablespace 'TEMP' #3 found in data dictionary, <(============================== (1)

but not in the controlfile. Adding to controlfile.

Tablespace 'USERS' #4 found in data dictionary,

but not in the controlfile. Adding to controlfile.

File #4 found in data dictionary but not in controlfile.

Creating OFFLINE file 'MISSING00004' in the controlfile. <(==================== (2)

File #5 is online, but is part of an offline tablespace. <(==================== (3)

data file 5: '/<path>/example01.dbf'

File #7 found in data dictionary but not in controlfile. <(==================== (2)

Creating OFFLINE file 'MISSING00007' in the controlfile.

File #8 is offline, but is part of an online tablespace. <(==================== (4)

data file 8: '/<path>/mydata02.dbf'

File #9 is online, but is part of an offline tablespace. <(==================== (3)

data file 9: '/<path>/example02.dbf'

Dictionary check complete



We discuss below the points highlighted :


(1) Check if the temp files exist. If not, add them as per your preference: 


SQL> select file#, name from v$tempfile ;

no rows selected

SQL> select file#, name from dba_temp_files ;

no rows selected

SQL> select tablespace_name, status, contents from dba_tablespaces where contents='TEMPORARY' ;


TABLESPACE_NAME                STATUS    CONTENTS

------------------------------ --------- ---------

TEMP                           ONLINE    TEMPORARY



SQL> alter tablespace temp add tempfile '/<path>/temp01.dbf' size 10m ;

Tablespace altered.


SQL> select file#, substr(name, 1, 50), status, enabled from v$tempfile

FILE#    SUBSTR(NAME,1,50)                                  STATUS  ENABLED

-------- -------------------------------------------------- ------- ----------

       1 /<path>/temp01.dbf                                 ONLINE  READ WRITE


(2) It appears that the tablespace was brought offline using "ALTER TABLESPACE USERS OFFLINE" command. So, verify if the missing files really exist with original name. You may need to consult your pear DBAs, or refer alert.log / RMAN backup log or any such information which may provide clue about the actual file name.


If you find the file, try to rename them. If not, we can offline the datafile or drop associated tablespace:


SQL> select file#, status, enabled, substr(name, 1, 50) from v$datafile where name like '%MISSING%' ;

FILE#    STATUS  ENABLED    SUBSTR(NAME,1,50)

-------- ------- ---------- --------------------------------------------------

       4 OFFLINE DISABLED   /<path>/MISSING000

       7 OFFLINE DISABLED   /<path>/MISSING000



SQL> alter database datafile 4 online ;

alter database datafile 4 online

*

ERROR at line 1:

ORA-01157: cannot identify/lock data file 4 - see DBWR trace file

ORA-01111: name for data file 4 is unknown - rename to correct file

ORA-01110: data file 4: '/<oracle_home path>/dbs/MISSING00004'


SQL> alter database rename file 'MISSING00004' to '/<path>/users01.dbf' ;

Database altered.


SQL> alter database rename file 'MISSING00007' to '/<path>/users02.dbf' ;

Database altered.

SQL> select tablespace_name, status from dba_tablespaces where tablespace_name in (select tablespace_name from dba_data_files where file_id in (4, 7)) ;


TABLESPACE_NAME                STATUS

------------------------------ ---------

USERS                          OFFLINE


SQL> ALTER TABLESPACE USERS ONLINE ;


Tablespace altered.


Before proceeding, let's query the status for these files in alert.log:


SQL> select a.file#, substr(a.name, 1, 50) file_name, a.status file_status, a.error, substr(a.tablespace_name, 1, 10) tablespace_name, b.status tablespace_status from v$datafile_header a, dba_tablespaces b

where a.tablespace_name=b.tablespace_name /* and a.file# in (4, 5, 7, 8, 9) */ ;


FILE# FILE_NAME                                     FILE_STATUS ERROR           TABLESPA TABLESPACE_STATUS

----- --------------------------------------------- ----------- --------------- -------- ------------------

    1 /<path>/system01.dbf                           ONLINE                      SYSTEM   ONLINE

    2 /<path>/sysaux01.dbf                           ONLINE                      SYSAUX   ONLINE

    3 /<path>/undotbs01.dbf                          ONLINE                      UNDOTBS1 ONLINE

    4 /<path>/users01.dbf                            OFFLINE     OFFLINE NORMAL  USERS    OFFLINE <(== related to (2) in alert.log excerpt above

    5 /<path>/example01.dbf                          ONLINE                      EXAMPLE  OFFLINE <(== related to (3) in alert.log excerpt above

    6 /<path>/mydata01.dbf                           ONLINE                      MYDATA   ONLINE

    7 /<path>/users02.dbf                            OFFLINE     OFFLINE NORMAL  USERS    OFFLINE <(== related to (2) in alert.log excerpt above

    8 /<path>1/mydata02.dbf                          OFFLINE     WRONG RESETLOGS MYDATA   ONLINE <(=== related to (4) in alert.log excerpt above

    9 /<path>/example02.dbf                          ONLINE                      EXAMPLE  OFFLINE <(== related to (3) in alert.log excerpt above


9 rows selected.

So, we can attempt to correct the "ERROR" as displayed in above query depending on the availability of file / archived logs and other possible factors.

Let's continue,


(3) It seems that tablespace was brought offline inconsistently ( ALTER TABLESPACE EXAMPLE OFFLINE IMMEDIATE ). If the archived log generated at that time has got applied, the file may be back online : 

 

SQL> alter tablespace example ONLINE ;

Tablespace altered.

 

(4) This tablespace MYDATA has 2 datafiles File# 6 & 8. It appears that File# 8 was brought offline ( using ALTER DATABASE DATAFILE 8 OFFLINE ) and it was OFFLINE before OPEN RESETLOGS. If the archived log generated at that time has got applied during recovery or all the archived logs are available for recovery since that time, the file may be back online :


SQL> alter database datafile 8 online ;

alter database datafile 8 online

*

ERROR at line 1:

ORA-01190: control file or data file 8 is from before the last RESETLOGS

ORA-01110: data file 8: '/<path>/mydata02.dbf'


SQL> alter tablespace mydata online ;

alter tablespace mydata online

*

ERROR at line 1:

ORA-01190: control file or data file 8 is from before the last RESETLOGS

ORA-01110: data file 8: '/<path>/mydata02.dbf'


SQL> recover datafile 8 ;

Media recovery complete.

SQL> alter database datafile 8 online ;

Database altered.


SQL> alter tablespace mydata online ;


Tablespace altered.


Please note that it is not always possible to recover and bring the file online which is failing with error " ORA-01190: control file or data file x is from before the last RESETLOGS".


(5) There can be a scenario where the tablespace was in READ ONLY mode before OPEN RESETLOGS. Please check below Article on that:


Friday 12 May 2023

How to resolve MRP stuck issues on a physical standby database

 How to resolve MRP stuck issues on a physical standby database


Please use one of the following method to find the archive log file that MRP sticks at:

a. Please query v$managed_standby from the standby database if MRP is started.


% ps -ef |grep -i mrp 

SQL>select process, thread#, sequence#, status from v$managed_standby where process='MRP0';

PROCESS  THREAD#   SEQUENCE#  STATUS

-------- --------- ---------- ------------


MRP0     1         548        WAIT_FOR_GAP

Note: Starting from 12.2 use V$DATAGUARD_PROCESS view instead of v$managed_standby


b. Stop the managed recovery and start the manual recovery.


SQL>recover managed standby database cancel;

SQL>recover automatic standby database;

If you use data guard broker, you need to do these from either DGMGRL or grid control.


For 11g data guard broker,

DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='APPLY-OFF';

DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='APPLY-ON';


If your standby is a RAC database, then

DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='APPLY-ON' WITH APPLY INSTANCE = <instance-name>;

 

<instance-name> is the name of the instance you want to become the apply instance.

For 10g Data Guard broker,


DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='LOG-APPLY-OFF'; DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='ONLINE';


If your standby is a RAC database, then

DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='ONLINE' WITH APPLY INSTANCE = <instance-name>;

To change the state of the standby database from grid control for 10g and 11g standby databases, follow these steps:


1. From the Data Guard Overview page, select the standby database you want to change.

2. Click Edit to go to the Edit Properties page.

3. Select Log Apply Off (or Online).

4. Click Apply.

5. When the process completes, a message indicating success is returned.


For 9i Data Guard Broker,


DGMGRL> ALTER RESOURCE '<resource name>' ON SITE '<site name>' SET STATE='PHYSICAL-APPLY-READY';

DGMGRL> ALTER RESOURCE '<resource name>' ON SITE '<site name>' SET STATE='PHYSICAL-APPLY-ON';

To change the state of the standby database from grid control for 9i standby database, follow these steps:


1. In the navigator tree, select the standby database resource.

2. In the right-hand property sheet, click Set State.

3. Click Apply Off (or Online).

4. Click OK.


c. Use the lowest checkpoint_change# in the data file header to find the archive log that the recovery needs to recover from:


SQL>select min(fhscn),fhrba_Seq SEQUENCE from x$kcvfh group by fhrba_Seq;


  2. Check whether the archive log file you find from step 1 exists on the standby site in the location defined by the standby init parameter standby_archive_dest or log_archive_dest_1 if standby_archive_dest is not defined on the standby.


If so, then please compare whether it has the same size on the standby as it is on the primary by cksum command. For example


% cksum <archive log file full path/file name>

  

Note: if cksum is not available on your unix platform please see the document below which provides more commands, such as, SHA-1 and MD5 checksum utilities.


Note 549617.1 How To Verify The Integrity Of A Patch/Software Download?

  

You could also verify whether the archive log file is corrupted on either standby or primary site by


SQL>alter system dump logfile '<full path/archive log file name>' validate;

If you get the SQL prompt back without error, then the archive log file is not corrupted.


  3. If either the archive log file from step 1 doesn't exist on the standby or it is corrupted, then you would need to get a new copy from the primary site. Please refer to the document below on how to resolve a gap manually.


 Note 1537316.1 Data Guard Gap Detection and Resolution Possibilities


  4. If the archive log file exists on the standby site and is not corrupted, then please check whether it is registered to the standby controlfile by querying v$archived_log. For example,


SQL>select name from v$archived_log where (thread#=1 and sequence#=192917) or (thread#=2 and sequence#=26903);


If the last applied sequence# for thread 1 is 192916 and the last applied sequence# for thread 2 is 26902. Or the thread# and the sequence# are what you find from step 1 which the MRP sticks at.

If the archive log file names come back from the query above, then they are registered to the standby controlfile. Otherwise, the standby controlfile doesn't aware of them.



2. Please use the queries below to identify the current sequence# on the primary for each thread, the last received sequence# on the standby and the last applied sequence# on the standby for


each thread so that you will know whether the standby is in sync with the primary, whether there is a transport lag or an apply lag.


Primary: SQL > select thread#, max(sequence#) "Last Primary Seq Generated"

from v$archived_log val, v$database vdb

where val.resetlogs_change# = vdb.resetlogs_change#

group by thread# order by 1;

  

<standby>: SQL > select thread#, max(sequence#) "Last Standby Seq Received"

from v$archived_log val, v$database vdb

where val.resetlogs_change# = vdb.resetlogs_change#

group by thread# order by 1;

  

<standby>: SQL > select thread#, max(sequence#) "Last Standby Seq Applied"

from v$archived_log val, v$database vdb

where val.resetlogs_change# = vdb.resetlogs_change#

and applied='YES'

group by thread# order by 1;

 

=======================================================================


Please see the following Causes and Solutions for the MRP stuck issues on a physical standby database. If the MRP stucking issues still couldn't be resolved, then you could use RMAN incremental backup method to roll forward your physical standby database.


Note: You couldn't use the RMAN incremental backup method to roll forward a logical standby database.


Note 290817.1 Rolling a Standby Forward using an RMAN Incremental Backup in 9i

Note 836986.1 Steps to perform for Rolling forward a standby database using RMAN incremental backup when primary and standby are in ASM filesystem


SOLUTION

Cause 1 : Log transport issues.

MRP sticks on the standby because the primary has trouble to ship redo to the standby. You could check the status of remote archive destination on the primary database. For example,


SQL> SELECT DESTINATION, STATUS, ERROR FROM V$ARCHIVE_DEST WHERE DEST_ID=2;

It could give you various errors and you would need to resolve the log transport errors.


I would only illustrate very common errors related with password file issues here, such as ora-1031, ora-1017, ora-16191, ora-12154 etc.


Solution 1: Please use cksum command to verify the size of password file on the primary and the standby sites, and make sure SEC_CASE_SENSITIVE_LOGON  is set to false for 11g or above databases.

- use cksum command to verify the size of password file on the primary and the standby sites  


   % cd $ORACLE_HOME/dbs


   % cksum <password file name>               /* cksum command is only available on the UNIX platform  */


 

Note: if cksum is not available on your unix platform please see the document below which provides more commands, such as, SHA-1 and MD5 checksum utilities.


Note 549617.1 How To Verify The Integrity Of A Patch/Software Download?



If the sizes are different, you could create a new password file on one primary node and then copy it to other primary nodes and standby nodes.


UNIX

orapwd file=$ORACLE_HOME/dbs/orapw<local ORACLE_SID> password=<sys password> entries=5


Windows

orapwd file=$ORACLE_HOME/database/PWD<local ORACLE_SID>.ora password=<sys password> entries=5

 

Note: if the instance name on the standby database is different from the primary instance name, then their password file names are different.

 

You could check your local database instance name by


SQL>show parameter instance_name;

  

- Please make sure SEC_CASE_SENSITIVE_LOGON init parameter is set to false on both the primary and the standby databases for 11g or above databases.


If you have to set SEC_CASE_SENSITIVE_LOGON to true for business security reason, then you must create your password file with ignorecase=y attribute. For example,


orapwd file=$ORACLE_HOME/dbs/orapw<ORACLE_SID> password=<sys password> ignorecase=y entries=5

  

Please see Note 799353.1 - How to Resolve Error in Remote Archiving for more about log shipping issues.


Adding an new Standby fails with error Ora-12154: TNS:could not resolve the connect identifier specified (Doc ID 1240558.1)


Cause 2 : Firewall caused partial archive log transferred.

The contents of the mrp trace file show:


Media Recovery Log /<path>/<datafile_name>.dbf

ORA-00332: archived log is too small - may be incompletely archived

ORA-00334: archived log: '/<path>/<datafile_name>.dbf'

Background Media Recovery terminated with error 332

ORA-00332: archived log is too small - may be incompletely archived

ORA-00334: archived log: '/<path>/<datafile_name>.dbf'

----- Redo read statistics for thread 1 -----

 


The other common error is ORA-3135 and recommend you to check this cause and solution.


ORA-03135: connection lost contact when shipping redo log to standby database


ORA-03135: connection lost contact


ARC2: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3135)


Solution 2: Please make sure the following firewall features are disabled.

- SQLNet fixup protocol

- Deep Packet Inspection (DPI)

- SQLNet packet inspection

- SQL Fixup

- SQL ALG (Juniper firewall)


The wording and features can vary by vendor but all the above have some impact on some packets (not all packets are affected). Some firewalls can have an adverse effect on certain SQL packets transported across them (again, some not all).


You could then resolve the gap by following  Note 1537316.1 Data Guard Gap Detection and Resolution Possibilities.


Please see the following articles related to ora-3135.


Note 404724.1 ORA-03135 when connecting to Database

Note 739522.1 ORA - 03135 connection lost contact while shipping from Primary Server to Standby server

Note 730066.1 Diagnosis of ORA-3135/ORA-3136 Connection Timeouts when the Fault is in the Database

Note 787354.1 Troubleshooting ORA - 3135 Connection Lost Contact


Cause 3 : ARC process on the primary that is responsible for the heartbeat sticks on the network due to bug 6113783

You could see different symptoms. For example, you get FAL archive error as the primary arch process sticks and it doesn't ship the missing archive log files to the standby. The worst case you see is all arch processes stick and no one does local archiving and all online redo log files are full and that causes the primary database hangs. Please see the following errors related to this cause.


FAL[client]: Failed to request gap sequence

GAP - thread 1 sequence 2262-2266

DBID 2591313681 branch 686760021

FAL[client]: All defined FAL servers have been attempted.


SQL> alter session set nls_date_format='YYYY-MON-DD HH24:MI:SS';

SQL> select message, timestamp 

from v$dataguard_status 

where severity in ('Error','Fatal') 

order by timestamp;


MESSAGE

-------------------------------------------------------------------------------

TIMESTAMP

--------------------------

Error 12154 received logging on to the standby

OCT-12-2010 15:55:45


PING[ARC1]: Heartbeat failed to connect to standby '<standby_name>'. Error is 12154.

OCT-12-2010 15:55:45


Solution 3: The bug 6113783 was fixed in 11.2.0.2. You could workaround the issue by killing the arch processes on the primary database.

This won't harm your primary database at all as arch processes will be respawned automatically immediately by Oracle.


    % ps -ef |grep -i arc

    % kill -9 <ospid of arc process> <ospid of arc process> <ospid of arc process>...


NOTE: If you are on Windows Platform you can use either 'orakill'- command or any OS-Tool to kill the Windows Processes/Threads


Cause 4. The standby recovery asked for old sequence # that had been applied to the standby.

    a. This is caused by bug 6029179 that was fixed in 10.2.0.4.1. 

        Bug 6029179 Abstract: MRP WAITING FOR LOGS THAT ARE ALREADY APPLIED (with RAC primary)


    b. This could be caused by ARC process that is responsible for standby heartbeat sticking on

network as well due to Bug 6113783


Solution 4 : Apply the fix of bug 6029179 or kill the heartbeat arch process of each primary instance if the primary is a RAC database.

   a. Apply the fix of Bug 6029179 that was fixed in 10.2.0.4.1.


    b. You could identify the heartbeat arch process from the primary alert log file. Look for

the last start up information in the alert log file. For example,


Starting ORACLE instance (normal)

...

processes = 1800

.....

ARC4 started with pid=27, OS id=8805

ARC2: Archival started

ARC3: Archival started

ARC3: Becoming the 'no FAL' ARCH

ARC3: Becoming the 'no SRL' ARCH

ARC2: Becoming the heartbeat ARCH

...........

Completed: ALTER DATABASE OPEN


So ARC2 is the heartbeat arch process for this startup. If you checked the previous startup information,

it might be a different arch process. Then you could find the ospid of arc2 process and kill it.


% ps -ef |grep -i arc2

% kill -9 <ospid of arc2> 

NOTE: If you are on Windows Platform you can use either 'orakill'- command or any OS-Tool to kill the Windows Processes/Threads

Please do this for each primary instance if the primary database is a RAC database.


    c. The worst case would be to use RMAN incremental backup method to roll forward your standby database. 

       If your primary database is small, you could simply take a full database backup from the primary and restore it to the standby server to refresh or recreate your physical standby database.


Cause 5 Recover from the wrong location. 

    a. The init parameter standby_archive_dest is not defined on the standby database for 10g or lower version. The default location is $ORACLE_HOME/dbs.


    b. You use RMAN or manual method to restore or transfer archive log files to a different location

than the local archive location log_archive_dest_1 or standby_archive_dest defined on the standby.


Solution 5 Specify standby_archive_dest the same location as log_archive_dest_1 on the physical standby. Use from 'location' attribute in the manual recovery clause. 

    a. Always specify standby_archive_dest the same location as the local archive destination log_archive_dest_1 for 10g or lower version physical standby database.


It is obsoleted in 11g.


    b. When archive log files are located in a different location than the recovery expects, then you could use from 'location' attribute in the manual recovery clause. For example,


SQL>alter database recover managed standby database cancel;

SQL>alter database recover automatic from '/<different_location>/' standby database;

<different_location> is where your archive log files are located.


Cause 6: All standby redo log files are active on the standby database.

So that the standby redo log files couldn't receive new redo from the primary database.


SQL> select group#,thread#,bytes/1024/1024 as mbytes,used,archived,status

from v$standby_log


GROUP# THREAD# MBYTES USED ARC STATUS

---------- ---------- ---------- ---------- --- ----------

15 2 100 33292288 NO ACTIVE

16 2 100 3323904 NO ACTIVE

17 1 100 98733056 NO ACTIVE

18 2 100 3813376 NO ACTIVE

19 2 100 4763648 NO ACTIVE

...........


38 1 100 98072576 NO ACTIVE

39 2 100 7388672 NO ACTIVE

40 2 100 4899328 NO ACTIVE


Standby alert log file showed:


ORA-16014: log 15 sequence# 13234 not archived, no available destinations

ORA-00312: online log 15 thread 2: '+<diskgroup1>/<db_name>/onlinelog/group_15.457.

682446507'

ORA-00312: online log 15 thread 2: '+D<diskgroup2>/<db_name>/onlinelog/group_15.456.

682446509'

ORA-00312: online log 15 thread 2: '+<diskgroup3>/<db_name>/onlinelog/group_15.490.

682446509'


Solution 6: Please make sure you have enough space in the archive location.

log_archive_dest_1 is defined with proper valid_for values and db_unique_name.

standby_archive_dest is specified properly.

    a. Please make sure you have enough space in the local archive destination

specified by log_archive_dest_1 so that the standby redo log files could be

archived successfully.

Please also make sure you have enough space in the location specified by

standby_archive_dest so that the primary arch process could ship archive log files

to it successfully.

When log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST...', you would

need to check the location specified by init parameter db_recovery_file_dest.

For example:


log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST valid_for=(ALL_LOGFILE,ALL_ROLES)

db_unique_name=<standby_name>'


db_recovery_file_dest=+<diskgroup1> 

db_recovery_file_dest_size=214748364800


    b. Please make sure the local archive destination log_archive_dest_1 is defined

with VALID_FOR=(ALL_LOGFILES,ALL_ROLES) and DB_UNIQUE_NAME=<local db DB_UNIQUE_NAME>.

For example:


Oracle® Data Guard Concepts and Administration

3 Creating a Physical Standby Database


LOG_ARCHIVE_DEST_1='LOCATION=/<path>/<standby_name>/ VALID_FOR=(ALL_LOGFILES,ALL_ROLES)

DB_UNIQUE_NAME=<local database DB_UNIQUE_NAME>'


    c. Please specify standby_archive_dest properly on the standby database. For example,


STANDBY_ARCHIVE_DEST=/<path>/<db_name>/

 


When LOG_ARCHIVE_DEST_1='LOCATION=USE_DB_RECOVERY_FILE_DEST VALID_FOR=(ALL_LOGFILES,ALL_ROLES)

DB_UNIQUE_NAME=<local database DB_UNIQUE_NAME>', you specify STANDBY_ARCHIVE_DEST='LOCATION=USE_DB_RECOVERY_FILE_DEST'.

The init parameter standby_archive_dest is deprecated in 11g.



You could use alter system command to modify or define LOG_ARCHIVE_DEST_1 or STANDBY_ARCHIVE_DEST.

If your database is a RAC database, please use attribute sid='*' so that the change will be

made to all RAC instances for the database. For example,


SQL>alter system set log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST

valid_for=(ALL_LOGFILES,ALL_ROLES) db_unique_name=<standby_name>' scope=both sid='*';

 


If you use data guard broker, then you need to shutdown the broker first before you issue the command above from sqlplus. For example,


SQL>alter system set dg_Broker_start=false sid='*' scope=both;


Then after the change, you could start the broker by


SQL>alter system set dg_Broker_start=true sid='*' scope=both;

SQL>alter system set standby_archive_dest = 'LOCATION=USE_DB_RECOVERY_FILE_DEST' scope=both sid='*';


When 'scope' is not specified, the default value is 'both'. 


If you use broker, you could modify the StandbyArchiveLocation property as below:


DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET PROPERTY 'StandbyArchiveLocation'='LOCATION=USE_DB_RECOVERY_FILE_DEST';



You would need to issue the command on all standby instances if the standby database is a RAC database. 


    d. Please make sure the standby redo log file size on the standby database

are the same size as the primary online redo log file size and you have one more

standby redo log group on the standby than the primary online redo log group for each thread.

For example, if you have 2 threads on the primary database and three online redo log groups

for each thread on the primary, then you would need 4 standby redo log groups for thread 1

and 4 standby redo log groups for thread 2 on the standby database no matter how many instance

you have on the standby database.


In case of switchover and the database roles change, the same principle applies. For example,

if your current standby database has only one instance, and three online redo log groups for it,

then you would need only 4 standby redo log groups for thread 1 on the current primary database

no matter how many instances/threads it has.


Reference:


MAA - Creating a RAC Physical Standby for a RAC Primary (Doc ID 380449.1)


    e. Please clear all standby redo log groups by


SQL> alter database clear standby logfile group <#>;



Cause 7 : Partial archive log file is applied on the standby database.

The standby alert log file showed:


Fri Jan 9 20:11:18 2009

MRP0: Background Managed Standby Recovery process started (<standby_name>)

Managed Standby Recovery not using Real Time Apply

parallel recovery started with 7 processes

Media Recovery Log /<path>/<db_name>-656901558185964.arc

Fri Jan 9 20:11:24 2009

Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM

SESSION

Fri Jan 9 20:11:24 2009

Incomplete Recovery applied until change 14025537844

Fri Jan 9 20:11:24 2009

MRP0: Media Recovery Complete (<standby_name>)

Fri Jan 9 20:11:26 2009

MRP0: Background Media Recovery process shutdown (<standby_name>)


You could find which archive log file contains the change# 14025537844 by querying v$archived_log.


SQL>select thread#, sequence#, name, first_change#, next_Change#, deleted, status from v$archived_log where 14025537844 between first_change# and next_Change#;

 


Solution 7 Use RMAN incremental backup method to roll forward your physical standby database.

If your primary database is small, you could simply take a full database backup from the primary

and restore it to the standby server to refresh or recreate your physical standby database.


To prevent this happening again, please perform the following before you cancel the managed recovery

on the physical standby database or shutdown the physical standby database.


a. Defer the remote log transport on the primary database. For example, if log_archive_dest_2 is the remote archive destination,


SQL>alter system set log_archive_dest_state_2=defer; (for single instance primary database)

SQL>alter system set log_archive_dest_state_2=defer sid='*'; (for RAC primary database)


b. Please make sure the standby recovers the last received archive log file on the standby.

You can go to the location defined by log_archive_dest_1 or standby_archive_dest to see the most recent received archive log file.

Then check the latest applied sequence on the standby by the query below:


SQL>select thread#, max(sequence#) from v$archived_log where applied='YES' group by thread#;


Cause 8 : The archive log files were transferred to the standby manually, not through data guard log transport service.

So that they are not registered to the standby control file and MRP (Managed Recovery Process) couldn't recognize them.

For example, they are transferred from the primary by ftp, or scp, etc. or rman backup them and restores them to the standby.




The standby alert log file showed:


Media Recovery Waiting for thread 1 sequence 13615 (in transit)


Solution 8 Register those archive log files or use manual recovery.

    a. Register those manually transferred archive log files and then use managed recovery to recover them.


SQL> alter database recover managed standby database cancel;

SQL> alter database register logfile '<full path/archive log file name>';

or you could use one rman command to catalog all the archive logs that were manually transferred and needed to be added to the controlfile at the standby site.


rman> catalog start with 'PATH_TO_ARCHIVELOGS/'; 

SQL> alter database recover managed standby database disconnect;


    b. If you have lots of archive log files to register, then you could simply use manual recovery without registering them.

Make sure the archive log files are located in the location specified by log_archive_dest_1 or standby_archive_dest.

If not, then you would need to use from 'location' attribute in the manual recovery clause.


SQL> alter database recover managed standby database cancel;

SQL> alter database recover automatic standby database;

or

SQL> alter database recover automatic from '/<path>/' standby database;

 

        /<path>/ is the location where your archive log files are located, which is different from the local archive destination or the standby archive destination.


Cause 9 : The archive log files are deleted from the primary before they are shipped and applied to the standby.

You use RMAN to do backup and archive log files are deleted after backup before they are shipped and applied to the standby.


Solution 9 Restore archive log files from backup, register them and use managed recovery to recover them; or use manual recovery without registering them.

Please refer to the Solution 8 for details.


If you couldn't find the archive log files anywhere, then you have to use RMAN incremental backup to roll forward your physical standby database.


If your primary database is small, you could simply take a full database backup from the primary

and restore it to the standby server to refresh or recreate your physical standby database.


To prevent this happens again, please reconsider your RMAN deletion policy.


If your archive log files are configured to be in a flash recovery area, then you could configure RMAN deletion policy to APPLIED ON STANDBY so that the archive log files are deleted after they are applied to the standby database.


Please refer to the document below for details:


Configure RMAN to purge archivelogs after applied on standby (Doc ID 728053.1)


http://download.oracle.com/docs/cd/B19306_01/server.102/b14239/rman.htm#SBYDB00750


Oracle® Data Guard Concepts and Administration

10g Release 2 (10.2)

10.3.4 Deletion Policy for Archived Redo Log Files In Flash Recovery Areas


http://download.oracle.com/docs/cd/B28359_01/server.111/b28294/rman.htm#BAJHHAEB


Oracle® Data Guard Concepts and Administration

11g Release 1 (11.1)

11.3.2 RMAN Configurations at the Primary Database


11.3.3 RMAN Configurations at a Standby Database Where Backups are Performed


11.3.4 RMAN Configurations at a Standby Where Backups Are Not Performed


http://download.oracle.com/docs/cd/E11882_01/server.112/e17022/rman.htm#SBYDB4854


Oracle® Data Guard Concepts and Administration

11g Release 2 (11.2)

11.3.2 RMAN Configurations at the Primary Database


11.3.3 RMAN Configurations at a Standby Database Where Backups are Performed


11.3.4 RMAN Configurations at a Standby Where Backups Are Not Performed


Cause 10 : New datafiles are added to the primary, but they are not added to the standby automatically.

This could be caused by standby_file_managment=manual on the standby database.

Sometimes the new datafiles failed to be added to the standby database automatically even though

standby_file_managment was set to AUTO.


You could see various errors. Please refer to the documents below:


Note 743101.1 ORA-01110 ORA-1122 ORA-01251 When Restoring With a Backup from a Standby DB

Note 304488.1 Using standby_file_management with Raw Devices

Note 759181.1 Dataguard Physical Standby occurs ORA-600 [2130] after added new node to primary RAC database.


Note 549078.1 ORA-600 [25016] Errors Trying To Recover Managed Standby Database


Bug 3596262 Add DATAFILE on primary can cause standby to raise ORA-1274


ORA-1274 Encountered on Physical Standby After Adding Datafile to Primary (Doc ID 388659.1)


ORA-01110: data file 7: '+<diskgroup>/<path>/<datafile_name>.300.740842175'

ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 142995

ORA-00600: internal error code, arguments: [6101], [0], [40], [128], [], [], [], [], [], [], [], []

Slave exiting with ORA-10562 exception

ORA-10562: Error occurred while applying redo to data block (file# 7, block# 237743)

ORA-10564: tablespace <tablespace_name>

ORA-01110: data file 7: '+<diskgroup>/<path>/<datafile_name>.300.740842175


Please work with Oracle Corruption team on the root cause of datafile corruptions. Please refer to the


following documents:


Prevention, Detection and Repair of Database Corruption (Doc ID 76375.1)


Best Practices for Corruption Detection, Prevention, and Automatic Repair - in a Data Guard Configuration (Doc ID 1302539.1)


Solution 10 Add the new datafiles to the standby database manually.

1) Please take a hot backup of new datafiles from the primary database.


2) Create a new standby controlfile from the primary database by


SQL>alter database create standby controlfile as '/tmp/controlf.ctl';



If datafiles are on ASM, please follow the note below and you could ignore the rest of steps:


Note 734862.1 Step By Step Guide On How To Recreate Standby Control File

When Datafiles Are On ASM And Using Oracle Managed Files


Or you could modify the wrong datafile name in the standby controlfile by alter database rename command. For example,


SQL> ALTER DATABASE RENAME FILE '<path/UNNAMED00003>' to '<path/real datafile name>';



3) If the new datafile location on the primary is different from the standby, please make sure

db_file_name_convert init parameter is set on the standby database.


Note 47325.1 Init.ora Parameter "DB_FILE_NAME_CONVERT" Reference Note


If db_file_name_convert init parameter has already been set, then you could ignore this step.



4) Cancel the managed recovery


 


SQL>alter database recover managed standby database cancel; 

5) set standby_file_management=manual on the standby database and shutdown the standby database.


SQL>alter system set standby_file_management=manual sid='*';

SQL>shutdown immediate; 


 

6) Copy the hot backup of the new datafiles and the new standby controlfile to the standby.

Please make sure the controlfiles are located in the right location with right names

according to the init parameter control_files. Please make sure the copied datafiles are

located in the right location as well according to name from v$datafile.


7) startup the standby database in mount mode and set standby_file_management=auto.


SQL>startup mount;

SQL>alter system set standby_file_management=auto sid='*';

 

8) Start the managed recovery.


SQL>alter database recover managed standby database disconnect;

 


Cause 11 : MRP can stall after control file contention at the standby.

MRP is waiting for a log that has already been shipped but does not appear in v$archived_logs and there are ORA-16146 errors reported in the standby alert log.


Solution 11 This is fixed in 10.2.0.4. The workaround is to restart the standby.

Reference: Note 6004936.8 Bug 6004936  Standby apply stops after ORA-16146



You can directly participate in the Discussion about this Note below. The Frame is the interactive live Discussion - not a Screenshot ;-)


 


Cause 12 : Cancelling managed recovery hangs.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;


DGMGRL>EDIT DATABASE '<standby db_unique_name>' SET STATE='APPLY-OFF';


The above commands hang.


 


Solution 12 Let's shutdown abort the standby, startup mount, then have a clean shutdown, check if there are server processes left over for the standby instance.

 


- Let's shutdown abort the standby database.


 SQL>shutdown abort;

 SQL>startup mount;

  

- Then have a clean shutdown of the standby database.


 SQL>shutdown immediate;

  

- Then check if there is any oracle process left over for the standby instance.


 ps -ef |grep -i <standby instance_name>

- If so, then kill the left over process(es).


kill -9 <ospid>

  

- If not, then startup the standby and recovery.


SQL>startup mount;

SQL>recover automatic standby database;

  

After recovering the available archive log files on the standby, you could enter CANCEL at the prompt, and then start managed recovery.


If you use broker, then you issue the commands from dgmgrl. For example,


% dgmgrl            /* from the standby server  */


DGMGRL> startup mount;


DGMGRL> EDIT DATABASE '<standby db_unique_name>' SET STATE='APPLY-ON';