Monday, 22 May 2023

How to quickly check that Database is consistent after incomplete recovery (Point in Time Recovery) before OPEN RESETLOGS

 What is the LEAST amount of recovery that has to be done before a database restored from a backup can be opened?


After performing a RESTORE / RECOVER, we need to perform a quick validation to ensure database is consistent and ready for OPEN RESETLOGS.

This proactive check helps to prevent several issues which may appear during or after OPEN RESETLOGS.


Through out this note we assume that you are restoring from a valid backup.

There can be more scenarios than discussed here. Please consult Oracle Support when in doubt.


SOLUTION

 

NOTE: In the images and/or the document content below, the user information and environment data used represents fictitious data from the Oracle sample schema(s), Public Documentation delivered with an Oracle database product or other training material. Any similarity to actual environments, actual persons, living or dead, is purely coincidental and not intended in any manner.

For cold/offline backups, no archivelogs/recovery is necessary. You can simply open the database with resetlogs.


However for HOT/ONLINE backups, ALL archivelogs from backup start to backup end must be applied before the database can be opened - this is the MINIMUM amount of recovery needed.

To determine which log was current at the time the backup completed, note the COMPLETION time of the database backup - take this from the backup log.

If this is an RMAN backup you can also query the RMAN metadata. Ensure that the environment variable NLS_DATE_FORMAT is set before invoking rman so that timestamps as well as date are returned:


For unix:    

% export NLS_DATE_FORMAT='dd-mon-rr hh24:mi:ss'

% rman target /


For windows: 

> set nls_date_format=dd-mon-rr:hh24:mi:ss               

> rman target /


To find your backup:

RMAN> LIST BACKUP OF DATABASE COMPLETED AFTER '<date>';

 or

RMAN> LIST BACKUP OF DATABASE COMPLETED AFTER 'sysdate-n';

Set <date> to limit the output to the backups that you are interested and note the completion time - for a multi-piece backup, note the completion time of the LAST backuppiece created.

Through out this note, when running SQL queries you should set NLS_DATE_FORMAT at the session level as follows:


SQL> alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS' ;

Check 1: Checkpoint Time and Fuzziness

Objective: Verify that the datafiles are recovered to the intended point in time (PIT) and they are consistent (FUZZY=NO)


Query the current status and PIT (Point In Time up to which the datafiles have been recovered) of datafiles by reading datafile headers directly from the physical datafiles:


SQL> select fuzzy, status, error, recover, checkpoint_change#, checkpoint_time, count(*) from v$datafile_header group by fuzzy, status, error, recover, checkpoint_change#, checkpoint_time ;


FUZ STATUS  ERROR           REC CHECKPOINT_CHANGE# CHECKPOINT_TIME        COUNT(*)

--- ------- --------------- --- ------------------ -------------------- ----------

NO  ONLINE                                 5311260 31-AUG-2011 23:10:14          6

YES ONLINE                                 5311260 31-AUG-2011 23:10:14          1


a) Verify that the checkpoint_time / checkpoint_change# is in line with your intended UNTIL TIME/SCN. If not, recover the database further if you have more archived logs available.


b) If FUZZY=YES for some datafiles, it means more recovery is required. If these archived logs are lost, identify such datafiles and determine if we can take them offline. Warning: we will lose data in those datafiles if they are taken offline!


If the datafiles belong to SYSTEM or UNDO tablespace, we MUST not bring such datafile offline without proper analysis. Please consult Oracle Support for further actions.


SQL> select file#, substr(name, 1, 50), substr(tablespace_name, 1, 15), undo_opt_current_change# from v$datafile_header where fuzzy='YES' ;


     FILE# SUBSTR(NAME,1,50)                                  SUBSTR(TABLESPA UNDO_OPT_CURRENT_CHANGE#

---------- -------------------------------------------------- --------------- ------------------------

         3 /<path>/undotbs01.dbf                                   UNDOTBS1                  5117431


Occasionally, if the tablespace name doesn't indicate it to be UNDO tablespace, if we see non-zero value in column UNDO_OPT_CURRENT_CHANGE#, it indicates that the datafile contains undo segments.


To bring a datafile offline :


SQL> alter database datafile <file#> offline ;


Check 1 can be considered PASSED when :


a) Verified that all the datafiles are at the some checkpoint_time, and this is your intended Point in time.

b) Fuzzy=NO for SYSTEM, UNDO and all intended datafiles. For datafiles with Fuzzy=YES, either recover them further or bring them OFFLINE if no further archived logs are available.


Check 2: Datafile Status

Objective: Verify that the files with status=RECOVER are not OFFLINE unintentionally


SQL> select status, enabled, count(*) from v$datafile group by status, enabled ;


STATUS  ENABLED      COUNT(*)

------- ---------- ----------

SYSTEM  DISABLED            1

ONLINE  READ WRITE          4

RECOVER DISABLED            2


If the files are in RECOVER status, verify if they are OFFLINE :

SQL> select file#, substr(name, 1, 50), status, error, recover from v$datafile_header ;


If you want the data for these files to be accessible, then bring them ONLINE :


SQL> alter database datafile <file#> ONLINE ;


Check 2 can be considered Passed when:


All the intended datafiles are not OFFLINE 


Check 3: Absolute Fuzzy

Objective: Additional Fuzzy check (Absolute Fuzzy check)


Occasionally, it is possible to see Fuzzy=NO and same checkpoint_change# for all the intended datafiles but OPEN RESETLOGS still fails


eg:


SQL> select fuzzy, status, error, recover, checkpoint_change#, checkpoint_time, count(*) from v$datafile_header group by fuzzy, status, error, recover, checkpoint_change#, checkpoint_time ;


FUZ STATUS  ERROR           REC CHECKPOINT_CHANGE#      CHECKPOINT_TIME   COUNT(*)

--- ------- --------------- --- ------------------ -------------------- ----------

NO  ONLINE                                 5311260 31-AUG-2011 23:10:14          7



SQL> ALTER DATABASE OPEN RESETLOGS ;


ORA-01194: file 4 needs more recovery to be consistent

ORA-01110: data file 3: '/<path>/undotbs02.dbf'


Hence, we should perform additional fuzzy check known as Absolute Fuzzy Check:

SQL> select hxfil file#, substr(hxfnm, 1, 50) name, fhscn checkpoint_change#, fhafs Absolute_Fuzzy_SCN, max(fhafs) over () Min_PIT_SCN from x$kcvfh where fhafs!=0 ;


FILE#      NAME                                               CHECKPOINT_CHANG ABSOLUTE_FUZZY_S     MIN_PIT_SCN

---------- -------------------------------------------------- ---------------- ---------------- ----------------

         4 /<path>/undotbs01.dbf                                  5311260          5311524          5311524

         6 /<path>/system01.dbf                                   5311260          5311379          5311524

 


Note: Column Min_PIT_SCN will return same value even for multiple rows as we have applied ANALYTICAL "MAX() OVER ()" function on it.


Above query indicates that the recovery must be performed at least UNTIL SCN 5311524 to make datafiles consistent and ready to OPEN. Since the checkpoint_change# is smaller than Min_PIT_SCN, the datafiles will ask for more recovery.


Check 3 can be considered PASSED when:


a) No rows selected from above query (i.e. Min_PIT_SCN is 0 (Zero) for all the datafiles)

b) Min_PIT_SCN is returned less than Checkpoint_Change#


Check 4: Archive Logs Required


Query the controlfile to find the latest archivelog required fore recovery. Lets say the backup completed at  31-AUG-2011 23:20:14:


SQL> -- V$ARCHIVED_LOG

SQL> --

SQL> ALTER SESSION SET NLS_DATE_FORMAT='DD-MON-RR HH24:MI:SS';

SQL> SELECT THREAD#, SEQUENCE#, FIRST_TIME, NEXT_TIME FROM V$ARCHIVED_LOG      WHERE '31-AUG-11 23:20:14' BETWEEN FIRST_TIME AND NEXT_TIME;

If the above query does not return any rows, it may be that the information has aged out of the controlfile - run the following query against v$log_history.


SQL> -- V$LOG_HISTORY  view does not have a column NEXT_TIME

SQL> --

SQL> ALTER SESSION SET NLS_DATE_FORMAT='DD-MON-RR HH24:MI:SS';

SQL> select a.THREAD#, a.SEQUENCE#, a.FIRST_TIME

       from V$LOG_HISTORY a

      where FIRST_TIME =

         ( SELECT MAX(b.FIRST_TIME)

             FROM V$LOG_HISTORY b

            WHERE b.FIRST_TIME < to_date('31-AUG-11 23:20:14', 'DD-MON-RR HH24:MI:SS')

         ) ;

SQL>


The sequence# returned by the above query is the log sequence current at the time the backup ended - let say 530 thread 1. 


For minimum recovery use: (Sequence# as returned +1 )


RMAN> RUN

{

 SET UNTIL SEQUENCE 531 THREAD 1;

 RECOVER DATABASE;

}

 

If this is a RAC implementation the use this SQL instead to query the controlfile:


SQL> SELECT THREAD#, SEQUENCE#, FIRST_CHANGE#, NEXT_CHANGE# FROM V$ARCHIVED_LOG WHERE '31-AUG-11 23:20:14' BETWEEN FIRST_TIME AND NEXT_TIME;


For minimum recovery use the log sequence and thread that has the lowest NEXT_CHANGE# returned by the above query.


Check 4 can be considered PASSED when:


All archivelogs from the time of the backup to the end of the backup is available for use during recovery


Check 5: Post OPEN RESETLOGS Activities

During OPEN RESETLOGS, monitor the alert.log for additonal errors/messages. You might see some messages like below during dictionary check: 


Dictionary check beginning

Tablespace 'TEMP' #3 found in data dictionary, <(============================== (1)

but not in the controlfile. Adding to controlfile.

Tablespace 'USERS' #4 found in data dictionary,

but not in the controlfile. Adding to controlfile.

File #4 found in data dictionary but not in controlfile.

Creating OFFLINE file 'MISSING00004' in the controlfile. <(==================== (2)

File #5 is online, but is part of an offline tablespace. <(==================== (3)

data file 5: '/<path>/example01.dbf'

File #7 found in data dictionary but not in controlfile. <(==================== (2)

Creating OFFLINE file 'MISSING00007' in the controlfile.

File #8 is offline, but is part of an online tablespace. <(==================== (4)

data file 8: '/<path>/mydata02.dbf'

File #9 is online, but is part of an offline tablespace. <(==================== (3)

data file 9: '/<path>/example02.dbf'

Dictionary check complete



We discuss below the points highlighted :


(1) Check if the temp files exist. If not, add them as per your preference: 


SQL> select file#, name from v$tempfile ;

no rows selected

SQL> select file#, name from dba_temp_files ;

no rows selected

SQL> select tablespace_name, status, contents from dba_tablespaces where contents='TEMPORARY' ;


TABLESPACE_NAME                STATUS    CONTENTS

------------------------------ --------- ---------

TEMP                           ONLINE    TEMPORARY



SQL> alter tablespace temp add tempfile '/<path>/temp01.dbf' size 10m ;

Tablespace altered.


SQL> select file#, substr(name, 1, 50), status, enabled from v$tempfile

FILE#    SUBSTR(NAME,1,50)                                  STATUS  ENABLED

-------- -------------------------------------------------- ------- ----------

       1 /<path>/temp01.dbf                                 ONLINE  READ WRITE


(2) It appears that the tablespace was brought offline using "ALTER TABLESPACE USERS OFFLINE" command. So, verify if the missing files really exist with original name. You may need to consult your pear DBAs, or refer alert.log / RMAN backup log or any such information which may provide clue about the actual file name.


If you find the file, try to rename them. If not, we can offline the datafile or drop associated tablespace:


SQL> select file#, status, enabled, substr(name, 1, 50) from v$datafile where name like '%MISSING%' ;

FILE#    STATUS  ENABLED    SUBSTR(NAME,1,50)

-------- ------- ---------- --------------------------------------------------

       4 OFFLINE DISABLED   /<path>/MISSING000

       7 OFFLINE DISABLED   /<path>/MISSING000



SQL> alter database datafile 4 online ;

alter database datafile 4 online

*

ERROR at line 1:

ORA-01157: cannot identify/lock data file 4 - see DBWR trace file

ORA-01111: name for data file 4 is unknown - rename to correct file

ORA-01110: data file 4: '/<oracle_home path>/dbs/MISSING00004'


SQL> alter database rename file 'MISSING00004' to '/<path>/users01.dbf' ;

Database altered.


SQL> alter database rename file 'MISSING00007' to '/<path>/users02.dbf' ;

Database altered.

SQL> select tablespace_name, status from dba_tablespaces where tablespace_name in (select tablespace_name from dba_data_files where file_id in (4, 7)) ;


TABLESPACE_NAME                STATUS

------------------------------ ---------

USERS                          OFFLINE


SQL> ALTER TABLESPACE USERS ONLINE ;


Tablespace altered.


Before proceeding, let's query the status for these files in alert.log:


SQL> select a.file#, substr(a.name, 1, 50) file_name, a.status file_status, a.error, substr(a.tablespace_name, 1, 10) tablespace_name, b.status tablespace_status from v$datafile_header a, dba_tablespaces b

where a.tablespace_name=b.tablespace_name /* and a.file# in (4, 5, 7, 8, 9) */ ;


FILE# FILE_NAME                                     FILE_STATUS ERROR           TABLESPA TABLESPACE_STATUS

----- --------------------------------------------- ----------- --------------- -------- ------------------

    1 /<path>/system01.dbf                           ONLINE                      SYSTEM   ONLINE

    2 /<path>/sysaux01.dbf                           ONLINE                      SYSAUX   ONLINE

    3 /<path>/undotbs01.dbf                          ONLINE                      UNDOTBS1 ONLINE

    4 /<path>/users01.dbf                            OFFLINE     OFFLINE NORMAL  USERS    OFFLINE <(== related to (2) in alert.log excerpt above

    5 /<path>/example01.dbf                          ONLINE                      EXAMPLE  OFFLINE <(== related to (3) in alert.log excerpt above

    6 /<path>/mydata01.dbf                           ONLINE                      MYDATA   ONLINE

    7 /<path>/users02.dbf                            OFFLINE     OFFLINE NORMAL  USERS    OFFLINE <(== related to (2) in alert.log excerpt above

    8 /<path>1/mydata02.dbf                          OFFLINE     WRONG RESETLOGS MYDATA   ONLINE <(=== related to (4) in alert.log excerpt above

    9 /<path>/example02.dbf                          ONLINE                      EXAMPLE  OFFLINE <(== related to (3) in alert.log excerpt above


9 rows selected.

So, we can attempt to correct the "ERROR" as displayed in above query depending on the availability of file / archived logs and other possible factors.

Let's continue,


(3) It seems that tablespace was brought offline inconsistently ( ALTER TABLESPACE EXAMPLE OFFLINE IMMEDIATE ). If the archived log generated at that time has got applied, the file may be back online : 

 

SQL> alter tablespace example ONLINE ;

Tablespace altered.

 

(4) This tablespace MYDATA has 2 datafiles File# 6 & 8. It appears that File# 8 was brought offline ( using ALTER DATABASE DATAFILE 8 OFFLINE ) and it was OFFLINE before OPEN RESETLOGS. If the archived log generated at that time has got applied during recovery or all the archived logs are available for recovery since that time, the file may be back online :


SQL> alter database datafile 8 online ;

alter database datafile 8 online

*

ERROR at line 1:

ORA-01190: control file or data file 8 is from before the last RESETLOGS

ORA-01110: data file 8: '/<path>/mydata02.dbf'


SQL> alter tablespace mydata online ;

alter tablespace mydata online

*

ERROR at line 1:

ORA-01190: control file or data file 8 is from before the last RESETLOGS

ORA-01110: data file 8: '/<path>/mydata02.dbf'


SQL> recover datafile 8 ;

Media recovery complete.

SQL> alter database datafile 8 online ;

Database altered.


SQL> alter tablespace mydata online ;


Tablespace altered.


Please note that it is not always possible to recover and bring the file online which is failing with error " ORA-01190: control file or data file x is from before the last RESETLOGS".


(5) There can be a scenario where the tablespace was in READ ONLY mode before OPEN RESETLOGS. Please check below Article on that:


No comments:

Post a Comment