Saturday, 8 October 2022

Exadata Patching Overview and Step by Step Exadata Storage Server Patching 18c


 ExaData Patching Overview


Patches are provided as complete software images that contain updates for the Linux operating system , Cell Server Software (Storage) , InfiniBand software and other component firmware.

  • In ExaData, Patching can be done in rolling and Non-rolling Fashion
  • Oracle release patches  quarterly . 

Exadata has three main component/layer that require Software patching

Storage Server –It contain upto 14 storage server. patchmgr utility can be used for patching Storage cells in a rolling or non-rolling fashion. patchmgr utilizes dcli to push patch software to storage cells.  Each storage server can take up to approx two hours. Storage server patches apply operating system, firmware, and driver updates. Current version can checked by imageinfo .Same patch applies to all hardware. Exadata software, OS,IB, ILOM, firmware and New Features.

InfiniBand Switch - All Exadata Machine contain two IB Switch. Patchmgr is used to patch it. It can be access by commands line or ILOM. Only update via Exadata-branded patch 

  # version 

SUN DCS 36p version: 1.3.3-2

Database Server – It contain up 8 database servers. Database server patching has two phases

       a) Firmware/OS -patchmgr used to patch Firmware/OS in all of Exdata Database se server . Current version can checked by imageinfo  

      b)   GRID and  ORACLE HOME  -Opatch utility is used to patch it.

             Current version can checked as below

              Opatch lspatches

Infiniband switches. Its software comes with Exadata cell storage patch software. So you don’t need to download separate software for IB switches.

Additional components, Ethernet(Cisco) Switch,PDU (Power Distribution Unit),KVM. You generally not need to upgrade these components unless you’ve any issue.


The patching procedure

1) Patching the cells ( Storage servers)

2) Patching the IB switches

3) Patching the Database servers (aka Compute Nodes)

4) Patching the Grid Infrastructure

5) Patching the databases ORACLE_HOMEs

The Rollback procedure

1) Cell Rollback

2)DB nodes Rollback

3) IB Switches Rollback

 Troubleshooting

1) Cell patching issue

2) CRS does not restart issue

3) A procedure to add instances to a database

4)  OPatch resume



To Apply patch

1)  MOS Notes

Read the following MOS notes carefully.

Exadata Database Machine and Exadata Storage Server Supported Versions (Doc ID 888828.1)Exadata 18.1.12.0.0 release and patch (29194095) (Doc ID 2492012.1)   

Oracle Exadata Database Machine exachk or HealthCheck (Doc ID 1070954.1)dbnodeupdate.sh and dbserver.patch.zip: Updating Exadata Database Server Software using the DBNodeUpdate Utility and patchmgr (Doc ID 1553103.1)  

2. We need to download the bundle patch as per your setup requirement, in our case it's QFSDP 19625719 

Patch 29181093 – Database server bare metal / domU ULN exadata_dbserver_18.1.12.0.0_x86_64_base OL6 channel ISO image (18.1.12.0.0.190111)

Download dbserver.patch.zip as p21634633_12*_Linux-x86-64.zip, which contains dbnodeupdate.zip and patchmgr for dbnodeupdate orchestration via patch 21634633

QFSDP releases contain the latest software for the following components:

1) Infrastructure

  •  Exadata Storage Server
  • InfiniBand Switch
  • Power Distribution Unit

2) Database

  •  Oracle Database and Grid Infrastructure PSU
  • Oracle JavaVM PSU (as of Oct 2014)
  • OPatch
  • OPlan

3) Systems Management

  •  EM Agent
  • EM OMS
  • EM Plug-ins

4) Move bundle patch to one of the compute node and extract it to compute node.

       unzip <filename>

5) Now extract all the tar files using tar utility

Command: #cat *.tar.* | tar -xvf -

It will make single directory with patch number called 19625719

It contained patches for following stacks

  • Database/Clusterware
  • Database Server
  • Storage Server 
  • Infiniband 
  • PDUs

Current Environment

Exadata X6-2 Half Rack (4 Compute nodes, 7 Storage Cells and 2 IB Switches) running ESS version 12.2.1.1.6

Cell Patching Pre-Check and Activity

 To be taken care before the activity

1) Run File system backup and database full backup (SA + DBA)

2) Take backup of iso file /opt/oracle.SupportTools/diagnostics.iso   (SA)

3) Verify hardware failure. Make sure there are no hardware failures before patching

dcli -g ~/dbs_group -l root ‘dbmcli -e list physicaldisk where status!=normal’

3) Release space from /root and /boot filesystems   (SA)

5) Verify disk space on Compute nodes
 dcli -g ~/dbs_group -l root ‘df -h /’

4)Run Exacheck and review for any actions to be taken. Run sundiag & ILOM snapshot also (SA)

5) Create proactive SR for the patching and upload Exachk and Sundiag reports (DBA)

6)Install and configure VNC Server on Exadata compute node 1. It is recommended to use VNC or screen utility for patching to avoid disconnections due network issues.

7) Clear or acknowledge alerts on db and cell nodes

[root@dm01db01 ~]# dcli -l root -g ~/dbs_group “dbmcli -e  drop alerthistory all"


On the day of activity

NON Rolling Patching

6)  Set blackout from OEM (DBA)
7) Comment all cronjob entries across all nodes of the environment for both oracle users and root(DBA)
8)Shutdown all databases and Grid services on all nodes of cluster(DBA)
************************************************************************
Stop the DB services in the below order
A) Stop db database and crs on all the database nodes
srvctl stop database -d <dbname>
srvctl status service -d <dbname>
srvctl stop DB services & disable

    Make sure  no services running  from $ORACLE_HOME of db
srvctl status service -d <dbanme>
stop crs on all nodes
crsctl check crs
crsctl stat res -t
crsctl stop crs -f
crsctl check crs

9) Disable cluster startup at reboot (DBA) On each DB node as root user, Set ASM environment, crsctl disable crs 10) Perform reset SP on all DB and Cell servers and wait for ILOM SP to startup(DBA) From first DB node as root user, dcli -l root -g all_group 'ipmitool sunoem cli "reset /SP" Y' dcli -l root -g all_group 'ipmitool sunoem cli "show faulty"' 11) Unmount NFS/ZFS mount points (SA) Comment in /etc/fstab 12) Perform reboot of all DB and all cell nodes (DBA) On all nodes as root user, shutdown -r -y now (or reboot) 13) Check for faulted component if any on all nodes(DBA) From first DB node as root user, dcli -l root -g all_group 'ipmitool sunoem cli "show faulty"' 14) Validate configuration on all cells(DBA) From first DB node as root user, dcli -l root -g cell_group 'cellcli -e alter cell validate configuration' dcli -l root -g ~/cell_group 'cellcli -e list physicaldisk | grep normal | wc -l' dcli -l root -g ~/cell_group 'cellcli -e list griddisk attributes asmmodestatus | grep ONLINE |wc -l' dcli -l root -g ~/cell_group 'cellcli -e list griddisk attributes asmdeactivationoutcome | grep Yes |wc -l' dcli -l root -g ~/cell_group 'cellcli -e list griddisk attributes name,asmmodestatus,size


Cell Patching (Implmentation)

 Please note: All commands are fired as root

1)Verify network configuration

Run below in VNC/Reflection Session

On First DB node : <hostname>

dcli -l root -g cell_group '/opt/oracle.cellos/ipconf -verify'

2)Verify ssh access to cells

On first DB node - <hostname>

dcli -g cell_group -l root 'hostname -i'

3) Shutdown services on cells

On first DB node : <hostname>

dcli -g cell_group -l root "cellcli -e alter cell shutdown services all"

4) Unzip the patchfile

From First DB node,

cd <patch path>

p29624194_181000_Linux-x86-64.zip  ( Already completed.  However Validate once before proceeding).

After Unzip, list the files  : ls -ltrh

5) Perform reset force and cleanup using patchmanager

cd <patch path>

./patchmgr -cells /root/cell_group -reset_force

./patchmgr -cells /root/cell_group -cleanup 

6) Check Prerequisites

From first DB node : <hostname>

cd <patch path>

./patchmgr -cells /root/cell_group -patch_check_prereq  

./patchmgr -cells /root/cell_group -cleanupcell,db alert history.

7)Patching the cells

From first DB node, go to the patch path as below

cd <patch path>

For NON -Rolling patch

./patchmgr -cells /root/cell_group -patch 

For Rolling patch

./patchmgr -cells /root/cell_group -patch [- Rolling] 

or

./patchmgr -cells <cell_group> -patch_check_prereq  

./patchmgr -cells <cell_group> -patch [- Rolling] 


or





Post Patching step

Note

"cd  /dbamaint/QFSDP_2019_29626115/29626115/Infrastructure/18.1.15.0.0/ExadataStorageServer_InfiniBandSwitch"

9) Perform  cleanup using patchmanager once the patch completed successfully

./patchmgr -cells /root/cell_group -cleanup

Check once all cell services are running or not.

10) From first database node ( hostname) as root user,

dcli -l root -g /root/cell_group "service celld status"

11) Verify imageinfo , should show success and the 

dcli -l root -g ~/cell_group "/opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aAll | grep 'FW Package Build'"

dcli -l root -g ~/cell_group "cat /etc/enterprise-release"

dcli -l root -g ~/cell_group "cat /etc/redhat-release"

dcli -l root -g ~/cell_group "/opt/oracle.cellos/CheckHWnFWProfile -c strict"

12dcli -l root -g ~/cell_group 'rpm -q ofa-`uname -r` '

dcli -l root -g ~/cell_group "cat /etc/enterprise-release"

dcli -l root -g ~/cell_group "/opt/oracle.cellos/CheckHWnFWProfile -c strict"

Reference Check the most recent set of validation logs for any failures. Check the /var/log/cellos/validations.log

and /var/log/cellos/vldrun*.log files for any failures. The image status is marked as failure when there is a failure

reported in one or more places in these validations. Use the following command to check for failures:

grep -i 'fail' /var/log/cellos/validations.log

If a specific validation failed, then the log will indicate where to look for the logs for that validation. Examine the specific

log /var/log/cellos/validations/validation_name.log to determine the problem, and check against know issues. If

the issue is not listed as a known issue in the My Oracle Support note for the patch, then contact Oracle Support Services for assistance.


IB Switch Upgrade

Pre-Check

1) Identify the number of switches in clusters.
[root@dm01dbadm01 ~]# ibswitches

2) Identify the current IB switch software version on all the Switches
[root@dm1dbadm01 patch_12.1.2.2.0.150917]# ssh dm01sw-ib1 version

3) Log in to Exadata Compute node 1 as root user and navigate the Exadata Storage Software staging area

[root@dm01dbadm01 ESS_121220]# cd /u01/app/oracle/software/ESS_121220/patch_12.1.2.2.0.150917/
[root@dm01dbadm01 patch_12.1.2.2.0.150917]# pwd
/u01/app/oracle/software/ESS_121220/patch_12.1.2.2.0.150917

4) Create a file named ibswitches.lst and enter IB switch names one per line as follows:
[root@dm01dbadm01 patch_12.1.2.2.0.150917]# vi ibswitches.lst
        
        dm01sw-ib1
        dm01sw-ib2
        dm01sw-ib3

    [root@dm01dbadm01 patch_12.1.2.2.0.150917]# cat ibswitches.lst

        dm01sw-ib1
        dm01sw-ib2
        dm01sw-ib3


Implementation

Please note: All commands are fired as root and should be run in VNC/Reflection

1) Ensure switch firmware must be atleast  release 1.3.3-2

2) By default, the patchmgr utility upgrades all the switches. To patch a subset of

switches create a file that lists one switch per line. The following is an example of the

file:  please create a file

/root/ibswitches.lst

Not required as the current version is same as of July Qfsdp

[root@<ibswicthes~]# version

SUN DCS 36p version: 2.2.7-1

3) Change to the patchmgr directory

4) Run the pre requisites check for the upgrade

./patchmgr -ibswitches /root/ibswitches.lst -upgrade -ibswitch_precheck  -force

If the output shows OVERALL SUCCESS then proceed with upgrade

5) Upgrade the IB switches

./patchmgr -ibswitches/root/ibswitches.lst -upgrade 

6) Verify the output of the switches and check the version.

[root@dm01dbadm01 ~]# ssh dm01sw-ib1 version


Patching on Database Machine

Summary

  • The patchmgr or dbnodeupdate.sh utility can be used for upgrading, rollback and backup Exadata Compute nodes. patchmgr utility can be used for upgrading Compute nodes in a rolling or non-rolling fashion. Compute nodes patches apply operating system, firmware, and driver updates.
  • Launch patchmgr from the compute node that is node 1 that has user equivalence setup to all the Compute nodes. Patch all the compute nodes except node 1 and later patch node 1 alone.
  • dbnodeupdate.sh is the shell script that patches every database server individually
  • patchmgr is the orchestration tool that starts dbnodeupdate.sh in parallel across many database servers (those in the dbs_group configuration file)
  • Before patchmgr, we were using dbnodeupdate.sh manually
  • dbnodeupdate.sh has a -M option to remove RPMs to resolve dependencies issues
  • patchmgr has the -modify_at_prereq option to remove RPMs to resolve dependencies issues  We always use patchmgr, you will see some error messages related to the -M option for dbnodeupdate.sh as it is the real script patching the servers
  • To sum up, when you start a ./patchmgr -dbnodes ~/dbs_group command, patchmgr will start the dbnodeupdate.sh script on each node contained in the ~/dbs_group file with the proper option
  • then a ./patchmgr -dbnodes ~/dbs_group -modify_at_prereq command will launch many dbnodeupdate.sh -M on each server specified in the ~/dbs_group file

Exadata Database Machine Patching Pre-Check and Activity

  1. Run exachk  for health check on Oracle Exadata Database Machine. more information HealthCheck (Doc ID 1070954.1)
  2. Verify disk space on Database Machine
  3. Take of backup Grid Home and Oracle Home and Databases
  4. Check SSH connectivity among Database Machine

Patching DB servers


Please Note: Perform below steps on each DB server
Please Note: Perform a full backup of filesystem before proceeding with this.
Please Note: All commands are fired as root

1)  Run below in VNC Session or reflection
    uname -r

2) Verify crs is down and disabled on all nodes

dcli -g dbs_group -l root "ps -ef | grep grid" 

dcli -g dbs_group -l root "/u01/app/12.1.0.2/grid/bin/crsctl config crs"

Check the Image Versions Before the Patch

dcli -g ~/dbs_group -l root imageinfo -ver


3) Run Precheck once again

cd /root
./dbnodeupdate.sh -v -u -n -l <patch path>

or


Take backup before patching db nodes:
./dbnodeupdate.sh  -b

Run the update script as below

./dbnodeupdate.sh  -u -n -l  <patch path>

or

Rolling Manner

The rolling manner will allow you to patch every node one by one. 
You will always have only one node unavailable, all the other nodes will remain up and running. This method of patching is almost online and could be 100% online with a good service rebalancing

[root@myclustercel01 ~]# cd /tmp/dbserver_patch_5.161014
[root@myclustercel01 dbserver_patch_5.161110]# ./patchmgr -dbnodes ~/dbs_group -precheck -iso_repo /tmp/p24669306_121233_Linux-x86-64.zip -target_version 12.1.2.3.3.161013
[root@myclustercel01 dbserver_patch_5.161110]# nohup ./patchmgr -dbnodes ~/dbs_group -upgrade -iso_repo /tmp/p24669306_12123


or
Perform pre check on all nodes except node 1
./patchmgr -dbnodes  dbs_group-1 -precheck -iso_repo /u01/app/oracle/software/exa_patches/dbnode/p29181093_181000_Linux-x86-64.zip -target_version 18.1.12.0.0.190111

Perform compute node backup
 ./patchmgr -dbnodes dbs_group-1 -backup -iso_repo /u01/app/oracle/software/exa_patches/dbnode/p29181093_181000_Linux-x86-64.zip -target_version 18.1.12.0.0.190111

Execute compute node upgrade
./patchmgr -dbnodes dbs_group-1 -upgrade -iso_repo /u01/app/oracle/software/exa_patches/dbnode/p29181093_181000_Linux-x86-64.zip -target_version 18.1.12.0.0.190111


Non-Rolling Manner

In a non-rolling manner, patchmgr will patch all the nodes at the same time in parallel. It will then be quicker, but a whole downtime is required.

[root@myclustercel01 ~]# cd /tmp/dbserver_patch_5.161014
[root@myclustercel01 dbserver_patch_5.161110]# ./patchmgr -dbnodes ~/dbs_group -precheck -iso_repo /tmp/p24669306_121233_Linux-x86-64.zip -target_version 12.1.2.3.3.161013
[root@myclustercel01 dbserver_patch_5.161110]# nohup ./patchmgr -dbnodes ~/dbs_group -upgrade -iso_repo /tmp/p24669306_121233_Linux-x86-64.zip -target_version 12.1.2.3.3.161013 &

4) Verify the /var/log/cellos/dbnodeupdate.log file for errors and also verify how many reboots it performed. Generally this upgrade perform reboot 2 times. 

If there are no errors then proceed with final step of patching

5) Perform the post installation steps as below. This step do binaries relink and lock the CRS home

[root@dbhost ]# ./dbnodeupdate.sh -c


6) If CRS has not yet started can start CRS
    /u01/app/12.1.0.2/grid/bin/crsctl stat res -init -t  (Monitor for all init services to come online)
    ./crsctl enable crs

7) Check imageinfo/ipmi version for success

imageinfo
ipmitool sunoem version

8) Verify imageinfo/ipmi versions

dcli -l root -g dbs_group "imageinfo"
dcli -l root -g dbs_group "ipmitool sunoem version"

9) Check for number of drives in RAID configuration 

dcli -l root -g dbs_group "/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -Aall | grep Drives"

10) dcli -l root -g ~/dbs_group "/opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aAll | grep 'FW Package Build'"

dcli -l root -g ~/dbs_group "cat /etc/enterprise-release"

dcli -l root -g ~/dbs_group "cat /etc/redhat-release"

dcli -l root -g ~/dbs_group "/opt/oracle.cellos/CheckHWnFWProfile -c strict"

dcli -l root -g ~/dbs_group 'rpm -q ofa-`uname -r` 

dcli -l rool -g dbs_group "ifconfig"  --  MTU size  for IB interfaces

11) Check once all cell services are running or not.

From first database node as root user,

dcli -l root -g /root/cell_group "service celld status"

12) Check MTU Value

ifconfig -a | grep MTU SIZE (Will check before patching).













No comments:

Post a Comment