Monday, 5 December 2022

Exadata patching on Storage Server,InfiniBand Switch And Compute Nodes 18c/19c

Exadata patching on Storage Server,InfiniBand Switch And Compute Nodes 18c/19c


CELL NODES/Storage Server Patching Plan


Exadata Master Note Doc ID – 888828.1-First go to the Oracle Document ID 888828.1, this is the first read requirement for any exadata database machine patching activity.


As part of patching on Compute Node ,The below is going to update

A) Oracle Linux Operating System

B) Firmware (Flash,Disk ,RAID Controller ,ILOM)

C) Exadata Software

1) There is Two type of patching

  a) ROLLING - - It does not required downtime ,only one cell is affected in case of any issue and it will take upto 2 hours per cell

  2) NON-ROLLING. All cells are done in parallel and all database must remain shutdown and much faster as compared to Rolling

1) Download And Stage patch on Compute Node

2) Create a File (cell_group to list all the Cell server Hostname

3) unpack patch and check SSH Connectivity

4) run exachk and fix any issue

5) configure a Blackout on Grid Control

6) perform a backup of all compute node

Summary

./patchmgr -cell ~/cell_group -reset_force

./patchmgr -cell ~/cell_group -reset_Cleanup


Non-Rolling update

1) dcli -g dbs_group -l root "Grid_home/bin/crsctl stop crs"

2) dcli -g cell_group -l roo "cellcli -e alter cell shutdown services all"

3) ./patchmgr -cells ~ /cell_group -patch_check_prereq

4) ./patchmgr -cells ~ /cell_group -patch


Rolling update

./patchmgr -cells ~/cell_group -patch_check_prereq -rolling

./patchmgr -cells ~cell_group -patch -rolling


eg . Real Time

=========================================================================

Important locations

=========================================================================

 

Patch home: /u01/patch_2022/cell/patch_20.1.19.0.0.220216

Cell_group: /u01/patch_2022/GI/cell_group

 

----------------------------------------------------------------------------------------------------------------------------

Cell nodes:      easyxdadc002cl01, easyxdadc002cl02, easyxdadc002cl03, easyxdadc002cl04, easyxdadc002cl05, easyxdadc002cl06, easyxdadc002cl07

 

10.33.131.229   easyxdadc002cl01.bbc.local      easyxdadc002cl01

10.33.131.230   easyxdadc002cl02.bbc.local      easyxdadc002cl02

10.33.131.231   easyxdadc002cl03.bbc.local      easyxdadc002cl03

10.33.131.232   easyxdadc002cl04.bbc.local      easyxdadc002cl04

10.33.131.233   easyxdadc002cl05.bbc.local      easyxdadc002cl05

10.33.131.234   easyxdadc002cl06.bbc.local      easyxdadc002cl06

10.33.131.235   easyxdadc002cl07.bbc.local      easyxdadc002cl07

 

Cell Nodes Current image version => Pre-Patching

=========================================================================

[root@easyxdadc002db01 cell]# dcli -g /u01/patch_2022/GI/cell_group -l root "imageinfo | grep 'image version'"

easyxdadc002cl01: Active image version: 18.1.31.0.0.201013

easyxdadc002cl01: Inactive image version: 12.2.1.1.8.180818

easyxdadc002cl02: Active image version: 18.1.31.0.0.201013

easyxdadc002cl02: Inactive image version: 12.2.1.1.8.180818

easyxdadc002cl03: Active image version: 18.1.31.0.0.201013

easyxdadc002cl03: Inactive image version: 12.2.1.1.8.180818

easyxdadc002cl04: Active image version: 18.1.31.0.0.201013

easyxdadc002cl04: Inactive image version: 12.2.1.1.8.180818

easyxdadc002cl05: Active image version: 18.1.31.0.0.201013

easyxdadc002cl05: Inactive image version: 12.2.1.1.8.180818

easyxdadc002cl06: Active image version: 18.1.31.0.0.201013

easyxdadc002cl06: Inactive image version: 12.2.1.1.8.180818

easyxdadc002cl07: Active image version: 18.1.31.0.0.201013

easyxdadc002cl07: Inactive image version: 12.2.1.1.8.180818

[root@easyxdadc002db01 cell]#

 

CELL NODE PATCHING STEPS

=========================================================================

=> CELL NODE PATCHING PRECHECKS

=========================================================================

=> ALL HAS TO BE PERFORMED FROM DBNODE 1

=========================================================================

-----------------

PRECHECK 1

-----------------

Step 1: [root@ easyxdadc002db01 ~]#cd /u01/patch_2022/cell/patch_20.1.19.0.0.220216

 

Step 2: [root@easyxdadc002db01 patch_20.1.19.0.0.220216]# dcli -g /u01/patch_2022/GI/cell_group -l root 'df -h /'

easyxdadc002cl01: Filesystem      Size  Used Avail Use% Mounted on

easyxdadc002cl01: /dev/md5        9.8G  4.9G  4.4G  53% /

easyxdadc002cl02: Filesystem      Size  Used Avail Use% Mounted on

easyxdadc002cl02: /dev/md5        9.8G  6.0G  3.3G  65% /

easyxdadc002cl03: Filesystem      Size  Used Avail Use% Mounted on

easyxdadc002cl03: /dev/md5        9.8G  4.8G  4.5G  52% /

easyxdadc002cl04: Filesystem      Size  Used Avail Use% Mounted on

easyxdadc002cl04: /dev/md5        9.8G  4.6G  4.7G  50% /

easyxdadc002cl05: Filesystem      Size  Used Avail Use% Mounted on

easyxdadc002cl05: /dev/md5        9.8G  4.6G  4.7G  50% /

easyxdadc002cl06: Filesystem      Size  Used Avail Use% Mounted on

easyxdadc002cl06: /dev/md5        9.8G  4.6G  4.7G  50% /

easyxdadc002cl07: Filesystem      Size  Used Avail Use% Mounted on

easyxdadc002cl07: /dev/md5        9.8G  4.6G  4.7G  50% /

[root@easyxdadc002db01 patch_20.1.19.0.0.220216]#

Imp:  Make sure all cell nodes have more than 3 GB space available if not then run cleanup and check again

 

./patchmgr -cells ~/cell_group -reset_force – first time the storage servers is updated oracle suggests this

 

./patchmgr -cells ~/cell_group -cleanup -cleans up patch files and temp contents in the cell servers,before cleaning up it will collect all the problem diagnostics and analysis.

 

Step 3:  Disk Check

 

Imp: Execute below commands and in case of any issues check with Oracle support and once fixed then only proceed further.

 

[root@easyxdadc002db01 patch_20.1.19.0.0.220216]# dcli -g /u01/patch_2022/GI/cell_group -l root 'cellcli -e list physicaldisk where status!=normal'

[root@easyxdadc002db01 patch_20.1.19.0.0.220216]# date

Wed Oct 12 14:19:58 BST 2022

[root@easyxdadc002db01 patch_20.1.19.0.0.220216]# [root@easyxdadc002db01 ~]#

 

[root@easyxdadc002db01 patch_20.1.19.0.0.220216]# dcli -l root -g /u01/patch_2022/GI/cell_group "cellcli -e list physicaldisk where diskType=FlashDisk and status not = normal"

[root@easyxdadc002db01 patch_20.1.19.0.0.220216]# date

Wed Oct 12 14:20:44 BST 2022

[root@easyxdadc002db01 patch_20.1.19.0.0.220216]# [root@easyxdadc002db01 ~]#

 

[root@easyxdadc002db01 patch_20.1.19.0.0.220216]# date

Wed Oct 12 14:20:44 BST 2022

[root@easyxdadc002db01 patch_20.1.19.0.0.220216]# dcli -g ~/cell_group -l root 'ipmitool sunoem cli "show -d properties -level all /SYS fault_state==Faulted"'

easyxdadc002cl01: Connected. Use ^D to exit.

easyxdadc002cl01: -> show -d properties -level all /SYS fault_state==Faulted

easyxdadc002cl01: show: Query found no matches.

easyxdadc002cl01:

easyxdadc002cl01:

easyxdadc002cl01: -> Session closed

easyxdadc002cl01: Disconnected

easyxdadc002cl02: Connected. Use ^D to exit.

easyxdadc002cl02: -> show -d properties -level all /SYS fault_state==Faulted

easyxdadc002cl02: show: Query found no matches.

easyxdadc002cl02:

easyxdadc002cl02:

easyxdadc002cl02: -> Session closed

easyxdadc002cl02: Disconnected

easyxdadc002cl03: Connected. Use ^D to exit.

easyxdadc002cl03: -> show -d properties -level all /SYS fault_state==Faulted

easyxdadc002cl03: show: Query found no matches.

easyxdadc002cl03:

easyxdadc002cl03:

easyxdadc002cl03: -> Session closed

easyxdadc002cl03: Disconnected

easyxdadc002cl04: Connected. Use ^D to exit.

easyxdadc002cl04: -> show -d properties -level all /SYS fault_state==Faulted

easyxdadc002cl04: show: Query found no matches.

easyxdadc002cl04:

easyxdadc002cl04:

easyxdadc002cl04: -> Session closed

easyxdadc002cl04: Disconnected

easyxdadc002cl05: Connected. Use ^D to exit.

easyxdadc002cl05: -> show -d properties -level all /SYS fault_state==Faulted

easyxdadc002cl05: show: Query found no matches.

easyxdadc002cl05:

easyxdadc002cl05:

easyxdadc002cl05: -> Session closed

easyxdadc002cl05: Disconnected

easyxdadc002cl06: Connected. Use ^D to exit.

easyxdadc002cl06: -> show -d properties -level all /SYS fault_state==Faulted

easyxdadc002cl06: show: Query found no matches.

easyxdadc002cl06:

easyxdadc002cl06:

easyxdadc002cl06: -> Session closed

easyxdadc002cl06: Disconnected

easyxdadc002cl07: Connected. Use ^D to exit.

easyxdadc002cl07: -> show -d properties -level all /SYS fault_state==Faulted

easyxdadc002cl07: show: Query found no matches.

easyxdadc002cl07:

easyxdadc002cl07:

easyxdadc002cl07: -> Session closed

easyxdadc002cl07: Disconnected

[root@easyxdadc002db01 patch_20.1.19.0.0.220216]#

 

dcli -l root -g ~/cell_group "cellcli -e drop alerthistory all"

 

[root@easyxdadc002db01 patch_20.1.19.0.0.220216]# dcli -l root -g  /u01/patch_2022/GI/cell_group 'cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome' | grep -vi yes

[root@easyxdadc002db01 patch_20.1.19.0.0.220216]# date

Wed Oct 12 14:23:56 BST 2022

[root@easyxdadc002db01 patch_20.1.19.0.0.220216]#

Step 4:  Although all configurations are in place so execute only select queries, don’t alter any values.

[root@easyxdrtw001db01 ~]#

 

---------------------------------------------------

ASM Check

---------------------------------------------------

From DBNODE1

.switchDB +ASM1

sqlplus / as sysasm


Verify that there is no rebalance running       

 

select * from gv$asm_operation; 

---------------------------------------------------

Rolling patch checks

---------------------------------------------------

 

Check ASM_POWER_LIMIT and adjust if needed

           

SQL> show parameter ASM_POWER_LIMIT

NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

asm_power_limit                      integer     1

 

Set ASM_POWER_LIMIT parameter value at least to 4.

 

alter system set ASM_POWER_LIMIT = 4 scope=both

 

show parameter ASM_POWER_LIMIT 


------------------------------------------------------------------------------------------------------

Check 'disk_repair_time' for all mounted disk groups in the Oracle ASM instance and adjust if needed

 

SQL> column dg format a15

column attribute format a30

column value format a15

select dg.name dg,a.name attribute,a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and a.name='disk_repair_time' order by dg;

SQL> SQL> SQL>

DG              ATTRIBUTE                      VALUE

--------------- ------------------------------ ---------------

DATA            disk_repair_time               8.5H

DBFS_DG         disk_repair_time               8.5H

RECO            disk_repair_time               8.5H

 

SQL>

 

Keep this 48:

 

alter diskgroup 'DATA' set attribute 'disk_repair_time'='48h';

alter diskgroup 'DBFS_DG' set attribute 'disk_repair_time'='48h';

alter diskgroup 'DEVDATA' set attribute 'disk_repair_time'='48h';

alter diskgroup 'DEVRECO' set attribute 'disk_repair_time'='48h';

alter diskgroup 'RECO' set attribute 'disk_repair_time'='48h';

 

--------------------------------------------------------------------

Check 'compatible.advm' and adjust if needed

--------------------------------------------------------------------

 

column dg format a15

column attribute format a30

column value format a15

select dg.name dg,a.name attribute,a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and a.name='compatible.advm' order by dg;

 

alter diskgroup '<diskgroup_name>' set attribute 'compatible.advm'='<clusterware-active-version>';

  

Step 5 :

--------------------------------------------------------------------

ILOM Checks

--------------------------------------------------------------------

 

Check ILOM access to all cell ILOM(s)

 

Cell ILOM(s)

 

ssh easyxdrtw001cl01-ilom

ssh easyxdrtw001cl02-ilom

ssh easyxdrtw001cl03-ilom

ssh easyxdrtw001cl04-ilom

ssh easyxdrtw001cl05-ilom

ssh easyxdrtw001cl06-ilom

ssh easyxdrtw001cl07-ilom

  

DB ILOM(s)

 

ssh easyxdrtw001da01-ilom

ssh easyxdrtw001da02-ilom

ssh easyxdrtw001da03-ilom

ssh easyxdrtw001da04-ilom


Step 6:

-------------------------------------------------------------------------------------

Stop agents if running (Rolling)        

-------------------------------------------------------------------------------------

 

su -l  -c "/bin/emctl stop agent"

dcli -l root -g dbs_group 'su -l  -c "/bin/emctl status agent"' | grep 'Agent is'

 

-------------------------------------------------------------------------------------

Cell Node Uptime check

-------------------------------------------------------------------------------------

 

Check uptime and reboot in ROLLING FASHION if needed

           

dcli -l root -g /u01/patch_2022/GI/cell_group uptime

  

Step 7 : Execute below command to run prechecks and make sure no errors reported , in case of any errors , please check with Oracle support and once fixed rerun the prechecks .

 

./patchmgr -cells ~/cell_group -patch_check_prereq -rolling -smtp_from "Cellnodes_Precheck" -smtp_to umesh.roy@easyreliable.com

 

=========================================================================

ACTUAL CELL NODE PATCHING STEPS NOW

=========================================================================

 

----------------------------------------------------------------------------------------------------------------------------

Stop services on local cell nodes

----------------------------------------------------------------------------------------------------------------------------cd /u01/patch_2022/cell/patch_20.1.19.0.0.220216

 

login to Cell Node –

 

[root@easyxdrtw001cl01 ~]#  cellcli -e list cell attributes rsStatus, msStatus, cellsrvStatus detail

         rsStatus:               running

         msStatus:               running

         cellsrvStatus:          running

 

[root@easyxdrtw001cl01 ~]#  cellcli -e alter cell shutdown services all

 

----------------------------------------------------------------------------------------------------------------------------

Cleanup space from any previous runs          

----------------------------------------------------------------------------------------------------------------------------

 

./patchmgr -cells cell_group -reset_force

 

./patchmgr -cells cell_group -cleanup

 

Apply patch in rolling fashion

 

Patch Precheck

nohup ./patchmgr -cells /home/oracle/prechk/cell01_file  -patch_check_prereq -rolling -smtp_from "Patching_Update_Cell01" support@easyreliable.com &

 

Patch Command

nohup ./patchmgr -cells /home/oracle/prechk/cell01_file  -patch -rolling -smtp_from "Patching_Update_Cell01" support@easyreliable.com &

 

Note : Repeat the same steps for all cell nodes

 

----------------------------------------------------------------------------------------------------------------------------

Check patching complete with imageinfo and imagehistory.

----------------------------------------------------------------------------------------------------------------------------

 

dcli -l root -g /home/oracle/prechk/cell01_file imageinfo| egrep 'Active image version|Cell boot usb version'

 

OR

 

dcli -l root -g ~/cell_group imageinfo| egrep 'Active image version|Cell boot usb version'

 

----------------------------------------------------------------------------------------------------------------------------

Check cells are online

----------------------------------------------------------------------------------------------------------------------------

 

dcli -g /home/oracle/prechk/cell01_file -l root "cellcli -e list cell"

dcli -g /home/oracle/prechk/cell02_file -l root "cellcli -e list cell"

dcli -g /home/oracle/prechk/cell03_file -l root "cellcli -e list cell"

dcli -g /home/oracle/prechk/cell04_file -l root "cellcli -e list cell"

dcli -g /home/oracle/prechk/cell05_file -l root "cellcli -e list cell"

dcli -g /home/oracle/prechk/cell06_file -l root "cellcli -e list cell"

dcli -g /home/oracle/prechk/cell07_file -l root "cellcli -e list cell"

 

---------------------------------------------------------------------------------------------------------------

Verify Status POST PATCHING of CELL NODES

--------------------------------------------------------------------------------------------------------------

 

dcli -l root –g /home/oracle/prechk/cell01_file service celld status

dcli –l root –g /home/oracle/prechk/cell01_file list griddisk attributes name, status

dcli –l root –g /home/oracle/prechk/cell01_file list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

dcli –l root –g /home/oracle/prechk/cell01_file alter griddisk all active

 

----------------------------------------------------------------------------------------------------------------------------

./patchmgr -cells /home/oracle/prechk/cell01_file -cleanup

./patchmgr -cells /home/oracle/prechk/cell02_file -cleanup

./patchmgr -cells /home/oracle/prechk/cell03_file -cleanup

./patchmgr -cells /home/oracle/prechk/cell04_file -cleanup

./patchmgr -cells /home/oracle/prechk/cell05_file -cleanup

./patchmgr -cells /home/oracle/prechk/cell06_file -cleanup

./patchmgr -cells /home/oracle/prechk/cell07_file -cleanup

 

----------------------------------------------------------------------------------------------------------------------------

Reboot the cell nodes once to ensure ILOM patches applied. 

----------------------------------------------------------------------------------------------------------------------------

 

dcli -g /home/oracle/prechk/cell01_file -l root "shutdown -r now"

dcli -g /home/oracle/prechk/cell02_file -l root "shutdown -r now"

dcli -g /home/oracle/prechk/cell03_file -l root "shutdown -r now"

dcli -g /home/oracle/prechk/cell04_file -l root "shutdown -r now"

dcli -g /home/oracle/prechk/cell05_file -l root "shutdown -r now"

dcli -g /home/oracle/prechk/cell06_file -l root "shutdown -r now"

dcli -g /home/oracle/prechk/cell07_file -l root "shutdown -r now"

 

Again wait for cells to come back online.  Typically 10 minutes.

 

[root@easyxdadc002cl01 ~]# imageinfo

 

Kernel version: 4.1.12-124.42.4.el6uek.x86_64 #2 SMP Thu Sep 3 16:03:23 PDT 2020 x86_64

Cell version: OSS_18.1.31.0.0_LINUX.X64_201013

Cell rpm version: cell-18.1.31.0.0_LINUX.X64_201013-1.x86_64

 

Active image version: 18.1.31.0.0.201013

Active image kernel version: 4.1.12-124.42.4.el6uek

Active image activated: 2021-01-30 02:40:24 +0000

Active image status: success

Active system partition on device: /dev/md5

Active software partition on device: /dev/md7

 

Cell boot usb partition: /dev/sdm1

Cell boot usb version: 18.1.31.0.0.201013

 

Inactive image version: 12.2.1.1.8.180818

Inactive image activated: 2019-05-18 09:32:18 +0100

Inactive image status: success

Inactive system partition on device: /dev/md6

Inactive software partition on device: /dev/md8

 

Inactive marker for the rollback: /boot/I_am_hd_boot.inactive

Inactive grub config for the rollback: /boot/grub/grub.conf.inactive

Inactive usb grub config for the rollback: /boot/grub/grub.conf.usb.inactive

Inactive kernel version for the rollback: 4.1.12-94.8.4.el6uek.x86_64

Rollback to the inactive partitions: Possible

[root@easyxdadc002cl01 ~]#

 

[root@easyxdrtw001cl01 ~]# imageinfo

 

Kernel version: 4.14.35-1902.306.2.14.el7uek.x86_64 #2 SMP Fri Jan 28 09:46:24 P                                                                             ST 2022 x86_64

Cell version: OSS_20.1.19.0.0_LINUX.X64_220216

Cell rpm version: cell-20.1.19.0.0_LINUX.X64_220216-1.x86_64

 

Active image version: 20.1.19.0.0.220216

Active image kernel version: 4.14.35-1902.306.2.14.el7uek

Active image activated: 2022-08-12 22:34:41 +0100

Active image status: success

Active node type: STORAGE

Active system partition on device: /dev/md5

Active software partition on device: /dev/md7

 

Cell boot usb partition: /dev/sdm1

Cell boot usb version: 20.1.19.0.0.220216

 

Inactive image version: 18.1.31.0.0.201013

Inactive image activated: 2020-11-28 21:07:26 +0000

Inactive image status: success

Inactive node type: STORAGE

Inactive system partition on device: /dev/md6

Inactive software partition on device: /dev/md8

 

Inactive marker for the rollback: /boot/I_am_hd_boot.inactive

Inactive grub config for the rollback: /boot/grub2/grub.cfg.inactive

Inactive usb grub config for the rollback: /boot/grub2/grub.cfg.usb.inactive

Inactive kernel version for the rollback: 4.1.12-124.42.4.el6uek.x86_64

Rollback to the inactive partitions: Possible

[root@easyxdrtw001cl01 ~]#

 

[root@easyxdrtw001cl01 ~]# imagehistory

Version                              : 12.2.1.1.0.170126.2

Image activation date                : 2017-03-22 23:21:56 +0000

Imaging mode                         : fresh

Imaging status                       : success

 

Version                              : 12.2.1.1.3.171017

Image activation date                : 2018-10-27 08:46:11 +0100

Imaging mode                         : out of partition upgrade

Imaging status                       : success

 

Version                              : 12.2.1.1.8.180818

Image activation date                : 2018-12-07 23:35:24 +0000

Imaging mode                         : out of partition upgrade

Imaging status                       : success

 

Version                              : 18.1.31.0.0.201013

Image activation date                : 2020-11-28 21:07:26 +0000

Imaging mode                         : out of partition upgrade

Imaging status                       : success

 

Version                              : 20.1.19.0.0.220216

Image activation date                : 2022-08-12 22:34:41 +0100

Imaging mode                         : out of partition upgrade

Imaging status                       : success

 

[root@easyxdrtw001cl01 ~]# 


Infiniband Switch Patch

There is two type of switch  one spine switch  and tow leaf switch.Switch firmware is upgraded in a rolling manner ,if a spine switch is present in the RACK then the Spine switch is upgraded first

Summary 

1) run pre-check

./patchmgr -ibswitches ibswitches.ist -upgrade -ibswitch_precheck

2) run upgrade

./patchmgr -ibwtiches ibswitches.ist -upgrade

3) verify 


Real-Time Example

1] All actions need to be performed by root user.

2] Upgrade infiniband is 100% online activity.

3] Apply infiniband switch patch from compute node 01.

Log in to Exadata Compute node 1 ( easyexdadc002da01.bbc.local) as root user and navigate the Exadata Storage Software staging area


Patch Location == /u01/patch_2022/IB/patch_switch_20.1.19.0.0.220311

Check the current version of IBSwitch

dcli -g /u01/patch_2022/GI/ibswitch_group -l root version | grep "version"


Steps to apply the Patch

Login to easyexdrtw001db01.bbc.local as root user

1. Execute below command to precheck the IB switch .Cross check no error are there , if no error found then only preceed for Step 2. 

#./patchmgr -ibswitches /u01/patch_2022/GI/ibswitch_group -upgrade -ibswitch_precheck -smtp_from "IBSwitch_Precheck" support@easyreliable.com

2. Excute below command to upgrade the IB Switches.

# nohup ./patchmgr -ibswitches /u01/conf_files/ibswitch_group  -upgrade -smtp_from "IBSwitch_Patch_Update" support@easyreliable.com &

3. Tail nohup.out and monitor . 

4. Check version of each IB Switch after patch

dcli -g /u01/conf_files/ibswitch_group -l root version | grep "version"


Rollback Steps:

- Manually download the InfiniBand switch firmware package to patch directory

- Set export variable "EXADATA_IMAGE_IBSWITCH_ROLLBACK_VERSION" to the appropriate version

- Run patchmgr command to initiate rollback.


DB Nodes/Compute Nodes Patch Steps

As part of patching on Compute Node ,The below is going to update

A) Oracle Linux Operating System

B) Firmware (Flash,Disk ,RAID Controller ,ILOM)

C) Exadata Software

There is Two Method for patching

There is no need to upzip patch software. We need to provide just path for patch software

Old Method

./dbnodeupdate.sh -u -l p20746761_121211_linux-x86-64_new.zip . It need to be run each compute node

New Method(starting 12.2.1.1.0 onwards)

./patchmgr -dbnodes /home/oracle/dbs_group -upgrade -iso_repo p25463013_12211-_Linux-x86-64.zip -target_version 12.2.1.1.0.170126.2   .It can be run in parallel which was not possible in old method


A Powerfull Utility : dbnodeupdate.sh

1) validate provided media (zip,http)

2) validate user inpute

3) create log file to track script execution and changes

4) create diag file with 'before patching' situation

5) create and runs the backup utility

6) check space requirements of /boot filesystem

7) includes 'check-only' option

8) Relinks all database and Grid Infrastructure (GI ) Homes

9) Enables/disables GI to stop/start

10) provide rollback option


We can patch compute Node in Rolling and Rolling Fashion

Behavior of NON-ROLLIMG upgrades


If a Node fails at the pre-check stage, The whole process fails

if a node fails at the patch stage or reboot stage , patchmgr skips further steps for the node . The upgrade process continues for the other nodes. The pre-check stage is done serially. the patch/reboot and complete stage are done in parallel

Summary for Patching Compute Node

1) run pre-check

2) fix all error reported

3) run the upgrade using patchmgr utility 

4) check the image version

5) install rpms required for 3rd party production



REAL TIME EXAMPLE

dcli -l root -g /root/dbs_group imageinfo -version

dcli -l root -g /root/dbs_group imageinfo -status

dcli -l root -g /root/dbs_group uname -r

Ensure backups are completed and commented 

comment out cronjobs

oem agents to be stopped


ACTION ON NODE BEING PATCHED


Bring down dbs gracefully

------------------------

script to bring down dbs


On each DB Node:

----------------

$GI_HOME/bin/crsctl status crs


************* UNMOUNT the NFS because reboot took time ******************************

comment fstab entry for NFS mount points 


df -h

umount -a -t nfs4,smbfs,nfs,cifs -f -l 

df -h

uptime

reboot

dbmcli -e list alerthistory


$GI_HOME/bin/crsctl disable crs

$GI_HOME/bin/crsctl stop crs

/u01/app/12.2.0.1/grid/bin


$GI_HOME/bin/crsctl check crs | grep online | wc -l | while read retval; do if [[ $retval -eq 0 ]]; then echo CRS Stopped; elif [[ $retval -eq 4 ]]; then echo CRS Running; else echo CRS Not Ready; fi; done;


uptime

Reset ILOM (SP):

-----------------

as root from node 1


#ipmitool bmc reset cold

Sent cold reset command to MC  <<<<< output

#

Precheck:

---------

cd /u01/patch_2022/dbnodeupdate

./dbnodeupdate.sh -u -l /u01/patch_2022/dbpatch/dbserver_patch_220723/p33757259_201000_Linux-x86-64.zip -t 20.1.19.0.0.220216 -a -v


say Y and proceed


did not report any home issue ...


Precheck - Skip GI & DB Homes Validation:

-----------------------------------------

cd /u01/patch_2022/dbnodeupdate

./dbnodeupdate.sh -u -l /u01/patch_2022/dbpatch/dbserver_patch_220723/p33757259_201000_Linux-x86-64.zip -t 20.1.19.0.0.220216 -S -a -v  


Check inodes to make sure Backup will success:

----------------------------------------------

inodes=$(df -i -P / | awk 'END{print $3}'); if [ $inodes -gt 500000 ] && [ $inodes  -le 1000000 ]; then echo -e "\nWARN... $inodes files\n"; elif [ $inodes -gt 1000000 ]; then echo -e "\nFAIL... $inodes files\n"; else echo -e "\nPASS... $inodes files\n"; fi


Backup Active LVM Sys1 to Inactive LVM Sys2:

--------------------------------------------

cd /u01/patch_2022/dbnodeupdate

./dbnodeupdate.sh -b -s -a


Remove Custom RPMs: (from above check and remove the custom rpms)

------------------

UNMOUNTING THE NFS


df -h

umount -a -t nfs4,smbfs,nfs,cifs -f -l 

df -h

comment in fstab,mtab


Precheck again after removing Custom RPMs:

------------------------------------------

cd /u01/patch_2022/dbnodeupdate

./dbnodeupdate.sh -u -l /u01/patch_2022/dbpatch/dbserver_patch_220723/p33757259_201000_Linux-x86-64.zip -t 20.1.19.0.0.220216 -S -a -v


Perform Upgrade: 

----------------

cd /u01/patch_2022/dbnodeupdate

nohup ./dbnodeupdate.sh -u -l /u01/patch_2022/dbpatch/dbserver_patch_220723/p33757259_201000_Linux-x86-64.zip -t 20.1.19.0.0.220216 -S -a -n -q &


cd /u01/patch_2022/dbnodeupdate

nohup ./dbnodeupdate.sh -u -l /u01/patch_2022/dbpatch/dbserver_patch_220723/p33757259_201000_Linux-x86-64.zip -t 20.1.19.0.0.220216 -S -a -n -q -w &


Checks After node coming up:

-----------------------------

#/opt/oracle.cellos/CheckHWnFWProfile 

[SUCCESS] The hardware and ****


***check with 

imageinfo


### Added ####dcli -l root -g /root/dbs_group imageinfo -version


df -h

umount -a -t nfs4,smbfs,nfs,cifs -f -l 


*** check if its should be mount

cd /u01/patch_2022/dbnodeupdate

./dbnodeupdate.sh -t 20.1.19.0.0.220216 -a -c -q

yum list installed | grep fuse


$GI_HOME/bin/crsctl check crs | grep online | wc -l | while read retval; do if [[ $retval -eq 0 ]]; then echo CRS Stopped; elif [[ $retval -eq 4 ]]; then echo CRS Running; else echo CRS Not Ready; fi; done;


$GI_HOME/bin/crsctl enable crs (dbnodeupdate.sh should enable crs & starts)

start dbs

start applications


************

Exadata Database server patching (GI+DB)



Bundle Patches

1) install on the top of Base Release+Patchset
2) install using the opatch utility/Opatchauto
3) it is a cumulative patch
4) includes the recent CPU/PSU
5) It comprises of the GI PSU & DB PSU

Summary

1) unzip the Grid Infra Software to a new ORACLE_HOME
2) run pre-check like clufy
3) run gridsetup.sh
4) check the version of clusterware



========

Note- patch details

earlier Oracle used to release patch as below

1) BP

2) PSU

3) SPU 

RU & RUR

RU: Release Updates . IT is just like BP

1) RU are release on a quarterly basis Jan,April,July And October

2) RU contain Optimizer Fixes+Functional Fixes +Regression Fixes +Security Fixes


RUR: Release Updates Revision - IT is just like PSU

1) RUR are release on a quarterly basis Jan,April,July And October

2) RUR contain  Regression Fixes +Security Fixes



 

 


No comments:

Post a Comment