Wednesday, 14 February 2018

How to rollback Oracle 12c Clusterware/GRID patch (PSU)Step by Step with issue description

Rollback summary

1) Stop oem Agent and Oswatcher on both node
2) Stop all database instance on node1 and  Validation of Oracle Inventory
3) On the 1st node,In case you are rolling back the patch, run this command from root user
#GRID_HOME/OPatch/opatchauto rollback <UNZIPPED_PATCH_LOCATION>/26635815 -analyze
4) On the 1st node, roll back the patch from the GI Home using the opatchauto command.
  As root user, execute the following command:
# <GI_HOME>/OPatch/opatchauto rollback <UNZIPPED_PATCH_LOCATION>/26635815 -oh <GI_HOME>"
5) Start all database instance  on  node1 and Stop the all instance  on  node2
6) From node2  -#GRID_HOME/OPatch/opatchauto rollback <UNZIPPED_PATCH_LOCATION>/26635815 -analyze
7) On the 2nd node, roll back the patch from the GI Home using the opatchauto command.
    As root user, execute the following command:
# <GI_HOME>/OPatch/opatchauto rollback <UNZIPPED_PATCH_LOCATION>/26635815 -oh <GI_HOME>"
8) Verify clusterware  components and patch is installed properly and verify patch log
    Check the following log files in $ORACLE_HOME/sqlpatch/26713565/ for errors:
    26713565_rollback_<database SID>_<CDB name>_<timestamp>.log
9) Start all database instances on node2
10) Check for any corruption in the inventory
11) Start oem Agent and Oswatcher on both node
12) Inform Application to start the application
13) Free all the related jobs which were on hold
14) Start DB monitoring on the Database :

Issue Description

While applying patch on 12 clusterware, We encountered below error due to wrong permission on file orarootagent.bin and one of sub patch applied then session terminated

Error Details

Patch inventory verified successfully on home /u02/oracle/
Bringing down CRS service on home /u02/oracle/
Prepatch operation log file location: /u02/oracle/
CRS service brought down successfully on home /u02/oracle/
Start applying binary patch on home /u02/oracle/
Successfully executed command: /usr/sbin/slibclean
Failed while applying binary patches on home /u02/oracle/

Execution of [OPatchAutoBinaryAction] patch action failed, check log for more details. Failures:
Patch Target : node1->/u02/oracle/ Type[crs]
Details: [
---------------------------Patching Failed---------------------------------
Command execution failed during patching in home: /u02/oracle/, host: node1.
Command failed:  /u02/oracle/  apply /staging/12ccluster_patch/26635815 -oh /u02/oracle/ -target_type cluster -binary -invPtrLoc /u02/oracle/ -persistresult /u02/oracle/ -analyzedresult /u02/oracle/
Command failure output:
==Following patches FAILED in apply:

Patch: /staging/12ccluster_patch/26635815/26392192
Log: /u02/oracle/
Reason: Failed during Patching: oracle.opatch.opatchsdk.OPatchException:
Prerequisite check "CheckApplicable" failed.

After fixing the cause of failure Run opatchauto resume with session id "QUSX"

OPATCHAUTO-68061: The orchestration engine failed.
OPATCHAUTO-68061: The orchestration engine failed with return code 1
OPATCHAUTO-68061: Check the log for more details.
OPatchAuto failed.

OPatchauto session completed at Sun Feb 11 12:25:07 2018
Time taken to complete the session 18 minutes, 12 seconds

 opatchauto failed with error code 42

ideally after fixing issue ,we should run this command, This should run immediately after fixing issue Since it know known configuration

opatchauto resume

As per detail log

[Feb 11, 2018 12:25:06 PM] [INFO]   Space Needed : 17407.515MB
[Feb 11, 2018 12:25:06 PM] [INFO]   Prereq checkPatchApplicableOnCurrentPlatform Passed for patch : 26392192
[Feb 11, 2018 12:25:06 PM] [INFO]   Patch 26392192:
                                    Copy Action: Destination File "/u02/oracle/" is not writeable.
                                    ',': Cannot copy file from 'orarootagent.bin' to '/u02/oracle/'
[Feb 11, 2018 12:25:06 PM] [INFO]   Prerequisite check "CheckApplicable" failed.
                                    The details are:

                                    Patch 26392192:
                                    Copy Action: Destination File "/u02/oracle/" is not writeable.
                                    ',': Cannot copy file from 'orarootagent.bin' to '/u02/oracle/'
[Feb 11, 2018 12:25:06 PM] [SEVERE] OUI-67073:UtilSession failed:
                                    Prerequisite check "CheckApplicable" failed.
[Feb 11, 2018 12:25:06 PM] [INFO]   Finishing UtilSession at Sun Feb 11 12:25:06 GMT 2018
[Feb 11, 2018 12:25:06 PM] [INFO]   Log file location: /u02/oracle/

upon investigation, we found that orarootagent.bin was owned oracrs ,Ideally It should be owned by root ,While running pre check ,we did not notice any issue
However ,We are not sure whether patch has caused permission issue.

[node1:root:/u02/oracle/] ls -ltr orarootagent.bin
-rwxr----x    1 oracrs   oinstall  482604228 Oct 06 2016  orarootagent.bin

Therefore we tried to rollback patch and could not succeed due mismatch of clusterware and software patch level and also crs was not getting started.

CRS-6706: Oracle Clusterware Release patch level ('1489215101') does not match Software patch level ('1505651481'). Oracle Clusterware cannot be started.
CRS-4000: Command Start failed, or completed with errors.

To resolve issue

The node1 was completely down and in order to do a workaround we have done the following steps to start the clusterware and the roll back the patch and re-applied which was successful.

1) A quick check on the problematic node:-

[node1:root:/u02/oracle/] ps -ef|grep init.ohasd|grep -v grep
root 11796600 1 0 Dec 28 - 0:00 /bin/sh /etc/init.ohasd run
[node1:root:/u02/oracle/] ps -ef|grep crsd|grep -v grep
[node1:root:/u02/oracle/] ps -ef|grep smon
root 66453538 63111324 0 15:57:00 pts/2 0:00 grep smon

2) Then I checked which patches were installed on both nodes node1 and node2 by the command :

[node1:root:/u02/oracle/] /u02/oracle/ op=patches
[node2:root:/u02/oracle/] /u02/oracle/ op=patches
And found 26392164 was not applied in node2 and inconsistency in patch level.

3) Then I ran the following command as the root user to complete the patching set up behind the scenes:

[node1:root:/u02/oracle/] ./clscfg -localpatch

4) Then I ran the following command as the root user to lock the GI home:

[heanor:root:/u02/oracle/] ./ -lock

5) Killed the following process :-

ps -ef|grep ora.gpnpd

kill -9 6619256

6) Finally we need to ran the below commands as the root user to start the GI:

[node1:root:/u02/oracle/] ./crsctl start crs

7) Execute the following command as root user on the problematic node to check if the patch level can be corrected.

/u02/oracle/ -patch

The clusterware  node1 was up at this moment and then we roll backed the patch and re-applied which was successful.

So now we want to the root cause of this incident and how to avoid this in future. 

Rollback command

 1) To verify  rollback and run opatchauto rollback with -Analyze  parameter from root user

./opatchauto rollback /staging/12ccluster_patch/26635815  -analyze 
-oh /u02/oracle/

2) run rollback command from root user

# <GI_HOME>/OPatch/opatchauto rollback /staging/12ccluster_patch/26635815  -oh /u02/oracle/
 node1:root:/u02/oracle/] export PATH=$PATH:/u02/oracle/
[node1:root:/u02/oracle/] echo $PATH
[node1:root:/u02/oracle/] id
uid=0(root) gid=0(system) groups=208(tivlogs)
[node1:root:/u02/oracle/] which patch
[node1:root:/u02/oracle/] which opatch

run  opatchauto rollback from root user

[node1:root:/u02/oracle/] ./opatchauto rollback /staging/12ccluster_patch/26635815 -oh /u02/oracle/

OPatchauto session is initiated at Sun Feb 11 17:05:08 2018

System initialization log file is /u02/oracle/

Session log file is /u02/oracle/
The id for this session is XNNN
Executing OPatch prereq operations to verify patch applicability on home /u02/oracle/
Patch applicablity verified successfully on home /u02/oracle/
Verifying patch inventory on home /u02/oracle/
Patch inventory verified successfully on home /u02/oracle/

Bringing down CRS service on home /u02/oracle/
Prepatch operation log file location: /u02/oracle/
CRS service brought down successfully on home /u02/oracle/
Start rolling back binary patch on home /u02/oracle/
Successfully executed command: /usr/sbin/slibclean
Binary patch rolled back successfully on home /u02/oracle/
Starting CRS service on home /u02/oracle/
Postpatch operation log file location: /u02/oracle/
CRS service started successfully on home /u02/oracle/
OPatchAuto successful.

Patching is completed successfully. Please find the summary as follows:

CRS Home:/u02/oracle/
==Following patches were SKIPPED:
Patch: /staging/12ccluster_patch/26635815/26392192
Reason: This Patch does not exist in the home, it cannot be rolled back.
==Following patches were SUCCESSFULLY rolled back:
Patch: /staging/12ccluster_patch/26635815/21436941
Log: /u02/oracle/

Patch: /staging/12ccluster_patch/26635815/26392164
Log: /u02/oracle/

Patch: /staging/12ccluster_patch/26635815/26713565
Log: /u02/oracle/

OPatchauto session completed at Sun Feb 11 17:19:00 2018
Time taken to complete the session 13 minutes, 54 seconds


No comments:

Post a Comment