Saturday 13 March 2021

Automatic Block Media Recovery in a DataGuard

 With Oracle 12.2, in a Data Guard environment corrupted data blocks can be automatically replaced with uncorrupted copies of those blocks.

There are just some requirements:
• The physical standby database must be operating in real-time query mode, which requires an Oracle Active Data Guard license.
• The physical standby database must be running real-time apply.
Automatic block media recovery works in two directions depending on whether the corrupted blocks are encountered on the primary or on the standby.
This recovery can happen if corrupted data are encountered in the primary database or in the secondary database.
In this blog we are going to see how this feature works.
Below the configuration we are using

DGMGRL> show configuration;
 
Configuration - ORCL_DR
 
  Protection Mode: MaxAvailability
  Members:
  ORCL_SITE  - Primary database
    ORCL_SITE1 - Physical standby database
    ORCL_SITE2 - Physical standby database
 
Fast-Start Failover: DISABLED
 
Configuration Status:
SUCCESS   (status updated 19 seconds ago)
 
DGMGRL>

We can verifiy the status of our databases

DGMGRL> show database  'ORCL_SITE' ;
 
Database - ORCL_SITE
 
  Role:               PRIMARY
  Intended State:     TRANSPORT-ON
  Instance(s):
    ORCL
 
Database Status:
SUCCESS


DGMGRL> show database  'ORCL_SITE1' ;
 
Database - ORCL_SITE1
 
  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 5.00 KByte/s
  Real Time Query:    ON
  Instance(s):
    ORCL
 
Database Status:
SUCCESS


DGMGRL> show database  'ORCL_SITE2' ;
 
Database - ORCL_SITE2
 
  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 4.00 KByte/s
  Real Time Query:    ON
  Instance(s):
    ORCL
 
Database Status:
SUCCESS
 
DGMGRL>

The feature works for any protection mode, but in our case the tests are done with a MaxAvailability mode

If corrupt data blocks are on the primary database, then the primary automatically searches for good copies of those blocks on a standby and, if they are found, has them shipped back to the primary.
This only requirement is that the primary requires a LOG_ARCHIVE_DEST_n to the standby. That already should be the case in a Data Guard environment.

If corrupted block is located on the standby, the standby will automatically request uncorrupted copies of those blocks to the primary. The condition for this mechanism to work is
• The LOG_ARCHIVE_CONFIG parameter is configured with a DG_CONFIG list and a LOG_ARCHIVE_DEST_n parameter is configured for the primary database.
or
• The FAL_SERVER parameter is configured and its value contains an Oracle Net service name for the primary database.

In this demonstration we will simulate data corruption in the primary database. But the scenario is the same for a corrupted blocks at standby side and will work same.
From the primary we can verify that LOG_ARCHIVE_DEST_n is set.

SQL> select dest_name,DESTINATION,status from v$archive_dest where destination is not null;
 
DEST_NAME            DESTINATION                    STATUS
-------------------- ------------------------------ ---------
LOG_ARCHIVE_DEST_1   USE_DB_RECOVERY_FILE_DEST      VALID
LOG_ARCHIVE_DEST_2   ORCL_SITE1                     VALID
LOG_ARCHIVE_DEST_3   ORCL_SITE2                     VALID

For the demonstration let’s consider a table of user SCOTT in a tablespace mytab

SQL> select table_name,tablespace_name from dba_tables where owner='SCOTT' and table_name='EMPBIS';
 
TABLE_NAME      TABLESPACE_NAME
--------------- ------------------------------
EMPBIS          MYTAB
 
SQL>

Now let’s identify blocks for the table on the primary and let’s corrupt them.

SQL> select * from (select distinct dbms_rowid.rowid_block_number(rowid)  from scott.empbis);
 
DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
------------------------------------
                                 131
 
SQL>

And then let’s run following command to corrupt corresponding blocks.

[oracle@primaserver ORCL]$ dd of=/u01/app/oracle/oradata/ORCL/mytab01.dbf bs=8192 seek=131 conv=notrunc count=1 if=/dev/zero
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000315116 s, 26.0 MB/s
[oracle@primaserver ORCL]$

In a normal environment, accessing to corrupted data by a SELECT will return errors.
But in our case on the primary if we flush the buffer_cache, and and we do a select on the table, rows are returned without errors.

13:41:18 SQL> alter system flush buffer_cache;
 
System altered.
 
13:41:22 SQL> select * from scott.empbis;
 
 EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM
---------- ---------- --------- ---------- --------- ---------- ----------
    DEPTNO
----------
      7369 SMITH      CLERK           7902 17-DEC-80        800
        20
 
      7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300
        30
 
      7521 WARD       SALESMAN        7698 22-FEB-81       1250        500
        30
 
 
     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM
---------- ---------- --------- ---------- --------- ---------- ----------
    DEPTNO
----------
      7566 JONES      MANAGER         7839 02-APR-81       2975
        20
 
      7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400
        30
 
      7698 BLAKE      MANAGER         7839 01-MAY-81       2850


Indeed Oracle automatically recovered corrupted blocks. And in the primary alert log at the same time we can see that a recovery was done.

ALTER SYSTEM: Flushing buffer cache inst=0 container=0 global
2018-01-26T13:41:26.540640+01:00
Hex dump of (file 2, block 131) in trace file /u01/app/oracle/diag/rdbms/orcl_site/ORCL/trace/ORCL_ora_3359.trc
 
Corrupt block relative dba: 0x00800083 (file 2, block 131)
Completely zero block found during multiblock buffer read
 
Reading datafile '/u01/app/oracle/oradata/ORCL/mytab01.dbf' for corruption at rdba: 0x00800083 (file 2, block 131)
Reread (file 2, block 131) found same corrupt data (no logical check)
Automatic block media recovery requested for (file# 2, block# 131)
2018-01-26T13:41:26.545798+01:00
Corrupt Block Found
         TIME STAMP (GMT) = 01/26/2018 13:41:25
         CONT = 0, TSN = 7, TSNAME = MYTAB
         RFN = 2, BLK = 131, RDBA = 8388739
         OBJN = 74352, OBJD = 74352, OBJECT = EMPBIS, SUBOBJECT =
         SEGMENT OWNER = SCOTT, SEGMENT TYPE = Table Segment
2018-01-26T13:41:27.002934+01:00
Automatic block media recovery successful for (file# 2, block# 131)
2018-01-26T13:41:27.005015+01:00
Automatic block media recovery successful for (file# 2, block# 131)





No comments:

Post a Comment