Tuesday, 18 August 2020

NODE EVICTION OVERVIEW and Troubleshooting steps

NODE EVICTION OVERVIEW and Troubleshooting steps


The Oracle Clusterware is designed to perform a node eviction by removing one or more nodes from the cluster if some critical problem is detected.  A critical problem could be a node not responding via a network heartbeat, a node not responding via a disk heartbeat, a hung or severely degraded machine, or a hung ocssd.bin process.  The purpose of this node eviction is to maintain the overall health of the cluster by removing bad members. 


Starting in 11.2.0.2 RAC or above (or if you are on Exadata), a node eviction may not actually reboot the machine.  This is called a rebootless restart.  In this case we restart most of the clusterware stack to see if that fixes the unhealthy node. 


Oracle Clusterware evicts the node when following condition occur:

- Node is not pinging via the network hearbeat

- Node is not pinging the Voting Disk

- Node is hung or busy and is unable to perform the above two tasks


What is the use of CSS Heartbeat Mechanism in Oracle RAC 


The CSS of the Oracle Clusterware maintains two heartbeat mechanisms

1. The disk heartbeat to the voting device and

2. The network heartbeat across the interconnect (This establish and confirm valid node membership in the cluster).

Both of these heartbeat mechanisms have an associated timeout value. The disk heartbeat has an internal i/o timeout interval (DTO Disk TimeOut), in seconds, where an i/o to the voting disk must complete. The misscount parameter (MC), as stated above, is the maximum time, in seconds, that a network heartbeat can be missed. The disk heartbeat i/o timeout interval is directly related to the misscount parameter setting. The Disk TimeOut(DTO) = Miscount(MC) - 15 secconds (some versions are different).

1.0 - PROCESS ROLES FOR REBOOTS


OCSSD (aka CSS daemon) - This process is spawned by the cssdagent process. It runs in both vendor clusterware and non-vendor clusterware environments.  OCSSD's primary job is internode health monitoring and RDBMS instance endpoint discovery. The health monitoring includes a network heartbeat and a disk heartbeat (to the voting files).  OCSSD can also evict a node after escalation of a member kill from a client (such as a database LMON process). This is a multi-threaded process that runs at an elevated priority and runs as the Oracle user.


Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdagent --> ocssd --> ocssd.bin


CSSDAGENT - This process is spawned by OHASD and is responsible for spawning the OCSSD process, monitoring for node hangs (via oprocd functionality), and monitoring to the OCSSD process for hangs (via oclsomon functionality), and monitoring vendor clusterware (via vmon functionality).  This is a multi-threaded process that runs at an elevated priority and runs as the root user.


Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdagent


CSSDMONITOR - This proccess also monitors for node hangs (via oprocd functionality), monitors the OCSSD process for hangs (via oclsomon functionality), and monitors vendor clusterware (via vmon functionality). This is a multi-threaded process that runs at an elevated priority and runs as the root user.


Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdmonitor

2.0 - DETERMINING WHICH PROCESS IS RESPONSIBLE FOR A REBOOT

Important files to review:

Clusterware alert log in

The cssdagent log(s)

The cssdmonitor log(s)

The ocssd log(s)

The lastgasp log(s) in /etc/oracle/lastgasp or /var/opt/oracle/lastgasp

CHM or OS Watcher data

'opatch lsinventory -detail' output for the GRID home

*Messages files:

* Messages file locations:

Linux: /var/log/messages

Sun: /var/adm/messages

HP-UX: /var/adm/syslog/syslog.log

IBM: /bin/errpt -a > messages.out


      Document 1513912.1 - TFA Collector - Tool for Enhanced Diagnostic Gathering


11.2 Clusterware evictions should, in most cases, have some kind of meaningful error in the clusterware alert log.  This can be used to determine which process is responsible for the reboot.  Example message from a clusterware alert log:

[ohasd(11243)]CRS-8011:reboot advisory message from host: sta00129, component: cssagent, with timestamp: L-2009-05-05-10:03:25.340

[ohasd(11243)]CRS-8013:reboot advisory message text: Rebooting after limit 28500 exceeded; disk timeout 27630, network timeout 28500, last heartbeat from CSSD at epoch seconds 1241543005.340, 4294967295 milliseconds ago based on invariant clock value of 93235653


This particular eviction happened when we had hit the network timeout.  CSSD exited and the cssdagent took action to evict. The cssdagent knows the information in the error message from local heartbeats made from CSSD. 


If no message is in the evicted node's clusterware alert log, check the lastgasp logs on the local node and/or the clusterware alert logs of other nodes. 

3.0 - TROUBLESHOOTING OCSSD EVICTIONS


If you have encountered an OCSSD eviction review common causes in section 3.1 below. 

3.1 - COMMON CAUSES OF OCSSD EVICTIONS

Network failure or latency between nodes. It would take 30 consecutive missed checkins (by default - determined by the CSS misscount) to cause a node eviction. 

Problems writing to or reading from the CSS voting disk.  If the node cannot perform a disk heartbeat to the majority of its voting files, then the node will be evicted.

A member kill escalation.  For example, database LMON process may request CSS to remove an instance from the cluster via the instance eviction mechanism.  If this times out it could escalate to a node kill. 

An unexpected failure or hang of the OCSSD process, this can be caused by any of the above issues or something else.

An Oracle bug.

3.2 - FILES TO REVIEW AND GATHER FOR OCSSD EVICTIONS


All files from section 2.0 from all cluster nodes.  More data may be required.


Example of an eviction due to loss of voting disk:


CSS log:

2012-03-27 22:05:48.693: [ CSSD][1100548416](:CSSNM00018:)clssnmvDiskCheck: Aborting, 0 of 3 configured voting disks available, need 2

2012-03-27 22:05:48.693: [ CSSD][1100548416]###################################

2012-03-27 22:05:48.693: [ CSSD][1100548416]clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread



OS messages:

Mar 27 22:03:58 choldbr132p kernel: Error:Mpx:All paths to Symm 000190104720 vol 0c71 are dead.

Mar 27 22:03:58 choldbr132p kernel: Error:Mpx:Symm 000190104720 vol 0c71 is dead.

Mar 27 22:03:58 choldbr132p kernel: Buffer I/O error on device sdbig, logical block 0

...

 

4.0 - TROUBLESHOOTING CSSDAGENT OR CSSDMONITOR EVICTIONS


If you have encountered a CSSDAGENT or CSSDMONITOR eviction review common causes in section 4.1 below.

4.1 - COMMON CAUSES OF CSSDAGENT OR CSSDMONITOR EVICTIONS

An OS scheduler problem.  For example, if the OS is getting locked up in a driver or hardware or there is excessive amounts of load on the machine (at or near 100% cpu utilization), thus preventing the scheduler from behaving reasonably.

A thread(s) within the CSS daemon hung.

An Oracle bug.

4.2 - FILES TO REVIEW AND GATHER FOR CSSDAGENT OR CSSDMONITOR EVICTIONS


All files from section 2.0 from all cluster nodes. More data may be required.



Cluster health check 


CSS Miss-count 

The Cluster Synchronization Service (CSS)  Miscount is the maximum time, in seconds, that a network heartbeat can be missed before a cluster reconfiguration to evict the node

How to get the CSS Misscount value 


$ crsctl get css misscount

CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.


oracrs@node1~]$

[oracrs@node1~]$ crsctl get css misscount

CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.

[oracrs@node1~]$


How to set the CSS Misscount value 

Shut down CRS on all nodes but one as root run crsctl on that remaining node


# crsctl stop crs


#crsctl set css  misscount 60

Configuration parameter misscount is now set to 60

#


2. CSS disktimeout  (Default 200)



The maximum amount of time (in seconds)allowed for a voting file I/O to complete, if this time is exceeded the voting disk will be marked as offline.Note that this is also the amount of time that will be required for initial cluster formation, i.e. when no nodes have previously been up and in a cluster.



How to get the CSS disktimeout  value 


$ crsctl get css disktimeout 

CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services.


[oracrs@node1l ~]$ crsctl get css disktimeout

CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services.

[oracrs@node1l ~]$


How to set the CSS disktimeout   value 

Shut down CRS on all nodes but one as root run crsctl on that remaining node

# crsctl stop crs


#crsctl set css  disktimeout  300

Configuration parameter disktimeout is now set to300

#



3. CSS reboottime(Default 3 seconds)


The amount of time allowed for a node to complete a reboot after the CSS daemon has been evicted

How to get the CSS reboottime value 


$crsctl get css reboottime

CRS-4678: Successful get reboottime 3 for Cluster Synchronization Services.


-4678: Successful get disktimeout 200 for Cluster Synchronization Services.

[oracrs@node1 ~]$ crsctl get css reboottime

CRS-4678: Successful get reboottime 3 for Cluster Synchronization Services.

[oracrs@node1 ~]$


How to set the CSS reboottime value 

Shut down CRS on all nodes but one as root run crsctl on that remaining node

# crsctl stop crs

#crsctl set css  reboottime 10



Configuration parameter reboottime is now set to 10

#

======================================




  • Oracle Clusterware uses voting disk files to determine which nodes are members of a cluster.
  • You can configure voting disks on Oracle ASM, or you can configure voting disks on shared storage.
  • If you do not configure voting disks on Oracle ASM, then for high availability, Oracle recommends that you have a minimum of three voting disks on physically separate storage.This avoids having a single point of failure. If you configure a single voting disk, then you must use external mirroring to provide redundancy.
  • No. of voting disks depend on the type of redundancy. From 11.2.0.x onwards OCR and voting files are placed in the ASM diskgroup.

External redundancy = 1 Voting disk
Normal redundancy = 3 Voting disks
High redundancy =      5 Voting disks

You can have up to 32 voting disks in your cluster

Oracle recommends that you configure multiple voting disks during Oracle Clusterware installation to improve availability. If you choose to put the voting disks into an Oracle ASM disk group, then Oracle ASM ensures the configuration of multiple voting disks if you use a normal or high redundancy disk group.

To identify the voting disk location :-

[oracle@rac1 ~]$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE b4a7f383bb414f7ebf6aaae7c3873401 (/dev/oracleasm/disks/ASMDISK1) [DATA]
Located 1 voting disk(s).

To backup the voting disk (Before 11gR2) :-

dd if=voting_disk_name of=backup_file_name

The following can be used to restore the voting disk from the backup file
created.

dd if=backup_file_name of=voting_disk_name

In previous versions of Oracle Clusterware you needed to backup the voting disks with the dd command. Starting with Oracle Clusterware 11g Release 2 you no longer need to backup the voting disks. The voting disks are automatically backed up as a part of the OCR. In fact, Oracle explicitly
indicates that you should not use a backup tool like dd to backup or restore voting disks. Doing so can lead to the loss of the voting disk.

What Information is stored in VOTING DISK/FILE?

It contains 2 types of data.

Static data: Information about the nodes in cluster

Dynamic data: Disk heartbeat logging

It contains the important details of the cluster nodes membership like

  • Which node is part of the cluster?
  • Which node is leaving the cluster?
  • Which node is joining the cluster?

Although the Voting disk contents are not changed frequently, you will need to back up the Voting disk file every time
– you add or remove a node from the cluster or
– immediately after you configure or upgrade a cluster.

To move voting disk  create another diskgroup with external redundancy named as ‘DATA1’

  • From 11gR2,voting files are stored on ASM diskgroup.
  • “add” or “delete” command is not available , only “replace” command is available when voting files are stored on ASM diskgroup.
  • Note: You cannot create more than 1 voting disk in the same or on another/different Disk group disk when using External Redundancy in 11.2.

To identify the status and voting disk location :-

[oracle@rac1 ~]$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE b4a7f383bb414f7ebf6aaae7c3873401 (/dev/oracleasm/disks/ASMDISK1) [DATA]
Located 1 voting disk(s).

Replace a voting disk :-

[oracle@rac1 ~]$ crsctl replace votedisk +DATA1
Successful addition of voting disk 9789b4bf42214f8bbf14fda587ba331a.
Successful deletion of voting disk b4a7f383bb414f7ebf6aaae7c3873401.
Successfully replaced voting disk group with +DATA1.
CRS-4266: Voting file(s) successfully replaced

Check the status and verify voting disk location :-

[oracle@rac1 ~]$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 9789b4bf42214f8bbf14fda587ba331a (/dev/oracleasm/disks/ASMDISK2) [DATA1]
Located 1 voting disk(s).

Why should we have ODD number of voting disk?

A node must be able to access more than half of the voting disks at any time.

Scenario:

Let us consider 2 node clusters with even number of voting disks say 2.

  • Let node 1 is able to access voting disk 1.
  • Node 2 is able to access voting disk 2.
  • From the above steps, we see that we don’t have any common file where clusterware can check the heartbeat of both the nodes.
  • If we have 3 voting disks and both the nodes are able to access more than half ie., 2 voting disks, there will be atleast one disk which will be accessed by both the nodes. The clusterware can use this disk to check the heartbeat of the nodes.
  • A node not able to do so will be evicted from the cluster by another node that has more than half the voting disks to maintain the integrity of the cluster.

Recover the corrupted voting disk :-

ASMCMD> lsdsk -G DATA1

Path
/dev/oracleasm/disks/ASMDISK2

As a root user,

#dd if=/dev/zero of=/dev/oracleasm/disks/ASMDISK2 bs=4096 count=1000000

The above session will get hang,

Check the clusterware status on  another session,

**************************************************************
rac1:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
**************************************************************
rac2:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
**************************************************************

After reboot both the nodes,check the clusterware status :-

[oracle@rac1 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager

 Since voting disk can’t be restored back to DATA1 diskgroup as disk in DATA1 has been corrupted

Stop the CRS forcefully in both the nodes and check the clusterware status,

[root@rac1 bin]# ./crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'rac1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

Start the CRS in exclusive mode in any nodes,

[root@rac1 bin]# ./crsctl start crs -excl
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.evmd' on 'rac1'
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2676: Start of 'ora.evmd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'rac1'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2676: Start of 'ora.crf' on 'rac1' succeeded
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2679: Attempting to clean 'ora.asm' on 'rac1'
CRS-2681: Clean of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'rac1'
CRS-2676: Start of 'ora.storage' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded

After CRS exclusive startup,check the clusterware status

[root@rac1 bin]# ./crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4692: Cluster Ready Services is online in exclusive mode
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Recreate the ASM diskgroups using ASMCA where voting disk is placed before named as ‘DATA1’

ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 1048576 30718 20165 0 20165 0 N DATA/
MOUNTED EXTERN N 512 4096 1048576 10236 10183 0 10183 0 N DATA1/

Check the voting disk location :-

[oracle@rac1 ~]$ crsctl query css votedisk
Located 0 voting disk(s).

Replace the voting disk   :-

[oracle@rac1 ~]$ crsctl replace votedisk +DATA1
Successful addition of voting disk 5a1ef50fe3354f35bfa7f86a6ccb8990.
Successfully replaced voting disk group with +DATA1.
CRS-4266: Voting file(s) successfully replaced

[oracle@rac1 ~]$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 5a1ef50fe3354f35bfa7f86a6ccb8990 (/dev/oracleasm/disks/ASMDISK2) [DATA1]
Located 1 voting disk(s).

Stop the CRS running in exclusive mode,

# crsctl stop crs

Start the CRS(clusterware)  in all nodes,

# crsctl start crs

Check the clusterware status of both nodes,

[root@rac1 bin]# ./crsctl check cluster -all
**************************************************************
rac1:
CRS-4535: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
rac2:
CRS-4535: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

Oracle RAC – Applying and Removing PSUs in Oracle RAC Home with Conflict Detection (12.1.0.2.0)

 Patch conflict and rollback and solution in RAC/Grid Environment

Oracle RAC – Applying and Removing PSUs in Oracle RAC Home with Conflict Detection (12.1.0.2.0)

While running conflict command to check conflict of Patch 30805421 We observed that sub patch -20243268

is conflicting ,So We need to deinstall and start patching through opatchauto


April 2020 patch Details

--------------------

Patch 30805421 - Grid Infrastructure Patch Set Update 12.1.0.2.200414

There are 3 main steps to applying the PSU:

• Detect and rollback conflicts patch.

• Apply Patch on Oracle Home.

• conflict-free version of Patch: 20243268 to proceed with the patching.


Issue

--------

There is a conflict between the DBPSU ( 30700212 -- Database Patch Set Update 12.1.0.2.200414 )

and Patch 20243268: EM QUERY WITH SQL_ID 4RQ83FNXTF39U PERFORMS POORLY ON ORACLE 12C RELATIVE TO 11G

and we might have to find a conflict-free version of Patch: 20243268 to proceed with the patching

The easiest and quickest way to determine if you have a patch already applied that will conflict with another that will still be applied is through the One-off Patch ConflictDetectionandResolution method.

If no conflict is found you can apply the patch, otherwise you need to resolve the conflicts.

Execute:

#


-MGMTDB:/u02/oracle/12.1.0.2/grid:N # line added by Agent


+ASM2:/u02/oracle/12.1.0.2/grid:N # line added by Agent

TestRAC:/u01/oracle/TestRAC/product/12.1.0.2:N # line added by Agent

[root@server1 ~]# cd /u01/oracle/TestRAC/product/12.1.0.2/OPatch

[root@server1 OPatch]#

[root@server1 OPatch]# export PATH=$PATH:/u01/oracle/TestRAC/product/12.1.0.2/OPatch

[root@server1 OPatch]# ./opatchauto apply /staging/april202_patch/30805421 -analyze -oh /u01/oracle/TestRAC/product/12.1.0.2

OPatchauto session is initiated at Mon Aug 17 14:21:27 2020

System initialization log file is /u01/oracle/TestRAC/product/12.1.0.2/cfgtoollogs/opatchautodb/systemconfig2020-08-17_02-21-37PM.log.

Session log file is /u01/oracle/TestRAC/product/12.1.0.2/cfgtoollogs/opatchauto/opatchauto2020-08-17_02-21-59PM.log

The id for this session is 1XC1

Executing OPatch prereq operations to verify patch applicability on home /u01/oracle/TestRAC/product/12.1.0.2


Patch applicability verified successfully on home /u01/oracle/TestRAC/product/12.1.0.2

Verifying SQL patch applicability on home /u01/oracle/TestRAC/product/12.1.0.2

SQL patch applicability verified successfully on home /u01/oracle/TestRAC/product/12.1.0.2

OPatchAuto successful.

--------------------------------Summary--------------------------------

Analysis for applying patches has failed:

Host:server1

RAC Home:/u01/oracle/TestRAC/product/12.1.0.2

Version:12.1.0.2.0

Analysis for patches has failed.

==Following patches FAILED in analysis for apply:

Patch: /staging/april202_patch/30805421/30700212

Log: /u01/oracle/TestRAC/product/12.1.0.2/cfgtoollogs/opatchauto/core/opatch/opatch2020-08-17_14-22-30PM_1.log


Reason: Failed during Analysis: CheckConflictAgainstOracleHome Failed, [ Prerequisite Status: FAILED, Prerequisite output:

Summary of Conflict Analysis:

-----------------------

Patches that can be applied now without any conflicts are :

30703508

Following patches have conflicts. Please contact Oracle Support and get the merged patch of the patches :

20243268, 30700212

Whole composite patch Conflicts/Supersets are:

Composite Patch : 30700212

Conflict with 20243268

Detail Conflicts/Supersets for each patch are:


Sub-Pa ...


==Following patches were SKIPPED:

Patch: /staging/april202_patch/30805421/26983807

Reason: This patch is not applicable to this specified target type - "rac_database"

Patch: /staging/april202_patch/30805421/30703511

Reason: This patch is not applicable to this specified target type - "rac_database"

==Following patches were SUCCESSFULLY analyzed to be applied:

Patch: /staging/april202_patch/30805421/30703508

Log: /u01/oracle/TestRAC/product/12.1.0.2/cfgtoollogs/opatchauto/core/opatch/opatch2020-08-17_14-22-30PM_1.log

OPATCHAUTO-72053: Analysis for the patches failed.

OPATCHAUTO-72053: Command execution failed.

OPATCHAUTO-72053: Please check the summary for more details.

OPatchauto session completed at Mon Aug 17 14:23:51 2020

Time taken to complete the session 2 minutes, 24 seconds

[root@server1 OPatch]#


Action Plan


1) Deinstall the currently installed patch :

Patch 20243268: EM QUERY WITH SQL_ID 4RQ83FNXTF39U PERFORMS POORLY ON ORACLE 12C RELATIVE TO 11G

2) Install the Database Patch Set Update 12.1.0.2.200414 - 30700212

3) Once the DBPSU is installed , download and install the conflict free version of Patch 20243268

Please note you need to download the "12.1.0.2.180417" version of this patch as this was the conflict PSU.


Patch 20243268: EM QUERY WITH SQL_ID 4RQ83FNXTF39U PERFORMS POORLY ON ORACLE 12C RELATIVE TO 11G

Last Updated 01-May-2018 14:23 (2+ years ago) >>>>>>>>>>>>>>>>>>>>>>>>>

Product Oracle Database - Enterprise Edition

Release Oracle 12.1.0.2.180417 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Platform Linux x86-64

Size 802.3 KB

Download Access Software

Detail Plans

----------

1) Deinstall the currently installed patch :

Patch 20243268: EM QUERY WITH SQL_ID 4RQ83FNXTF39U PERFORMS POORLY ON ORACLE 12C RELATIVE TO 11G


As the database home owner execute:


[TestRAC@server1OPatch]$

[TestRAC@server1OPatch]$ pwd

/u01/oracle/TestRAC/product/12.1.0.2/OPatch

[TestRAC@server1OPatch]$ id

uid=500(TestRAC) gid=200(dba) groups=200(dba),212(asmadmin),213(asmdba)

[TestRAC@server1OPatch]$


To deinstall the patch, follow these steps:

1. Stop database instance

srvctl stop instance -d <database instance> -i <database instance> -o immediate

2. Deinstall the patch by running the following command:

$ opatch rollback -id 20243268


2) Install the Database Patch Set Update 12.1.0.2.200414 - 30700212

From root user analyze conflict patch and apply patch

Start database instance

export PATH=$PATH:/u01/oracle/TestRAC/product/12.1.0.2/OPatch

./opatchauto apply /staging/april202_patch/30805421 -analyze -oh /u01/oracle/TestRAC/product/12.1.0.2


./opatchauto apply /staging/april202_patch/30805421 -oh /u01/oracle/TestRAC/product/12.1.0.2


3) Once the DBPSU is installed , download and install the conflict free version of Patch 20243268


Please note you need to download the "12.1.0.2.180417" version of this patch as this was the conflict PSU.

database instance should be down before applying patch

srvctl stop instance -d TestRAC -i TestRAC2 -o immediate

export PATH=$PATH:/u01/oracle/TestRAC/product/12.1.0.2/OPatch

1. Set your current directory to the directory where the patch is located and then run the OPatch utility by entering the following commands:

$ cd <PATCH_TOP_DIR>/20243268


$ opatch apply


2. Verify whether the patch has been successfully installed by running the following command:


$ opatch lsinventory

oracrs@server1OPatch]$

[oracrs@server1OPatch]$ ./opatch lspatches


30703511;ACFS PATCH SET UPDATE 12.1.0.2.200414 (30703511)

30703508;OCW PATCH SET UPDATE 12.1.0.2.200414 (30703508)

30700212;Database Patch Set Update : 12.1.0.2.200414 (30700212)

26983807;WLM Patch Set Update: 12.1.0.2.180116 (26983807)

OPatch succeeded.


[oracrs@server1OPatch]$

[testDB@server1OPatch]$

[testDB@server1OPatch]$ ./opatch lspatches


20243268;

30703508;OCW PATCH SET UPDATE 12.1.0.2.200414 (30703508)

30700212;Database Patch Set Update : 12.1.0.2.200414 (30700212)

OPatch succeeded.


[testDB@server1OPatch]$

3. Start the services from the Oracle home.


Other issue during patching

---------------------

Applied patch successfully on clusterware as well as database on first node on Friday ,

but on weekend both node rebooted abruptly, clusterware and database are started successfully on first node whereas on other node , clusterware is not coming up automatically and even manually is not getting started on port1002ovl

It look like due to mismatch of patch level , it is not getting started ,


Solution


1) We took down Node1 and started Node2

2) followed by patching Node2 with the same PSU as Node1

Apr2020 RU has been applied on node2 and all is now running normally

DB instance RACDB2 came up automatically once patch had been applied

3) both the Nodes are back up and operational.


Eg



[TestRAC@server1 OPatch]$ srvctl stop instance -d TestRAC -i TestRAC2 -o immediate

[TestRAC@server1 OPatch]$

[TestRAC@server1 OPatch]$

[TestRAC@server1 OPatch]$ id


uid=500(TestRAC) gid=200(dba) groups=200(dba),212(asmadmin),213(asmdba)


[TestRAC@server1 OPatch]$ ps -ef|grep pmon

oracrs 5342 1 0 13:48 ? 00:00:00 asm_pmon_+ASM2

TestRAC 27413 25732 0 14:29 pts/2 00:00:00 grep pmon

[TestRAC@server1 OPatch]$ ./opatch version

OPatch Version: 12.2.0.1.21


OPatch succeeded

[TestRAC@server1 OPatch]$ id


uid=500(TestRAC) gid=200(dba) groups=200(dba),212(asmadmin),213(asmdba)


[TestRAC@server1 OPatch]$ ./opatch rollback -id 20243268


Oracle Interim Patch Installer version 12.2.0.1.21

Copyright (c) 2020, Oracle Corporation. All rights reserved.

Oracle Home : /u01/oracle/TestRAC/product/12.1.0.2

Central Inventory : /u02/oracle/oraInventory

from : /u01/oracle/TestRAC/product/12.1.0.2/oraInst.loc

OPatch version : 12.2.0.1.21

OUI version : 12.1.0.2.0

Log file location : /u01/oracle/TestRAC/product/12.1.0.2/cfgtoollogs/opatch/opatch2020-08-17_14-30-20PM_1.log


Patches will be rolled back in the following order:


20243268


The following patch(es) will be rolled back: 20243268

Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.


(Oracle Home = '/u01/oracle/TestRAC/product/12.1.0.2')


Is the local system ready for patching? [y|n]


y

User Responded with: Y

Rolling back patch 20243268...


RollbackSession rolling back interim patch '20243268' from OH '/u01/oracle/TestRAC/product/12.1.0.2'

Patching component oracle.rdbms, 12.1.0.2.0...

RollbackSession removing interim patch '20243268' from inventory

Log file location: /u01/oracle/TestRAC/product/12.1.0.2/cfgtoollogs/opatch/opatch2020-08-17_14-30-20PM_1.log

OPatch succeeded.


[TestRAC@server1 OPatch]$

[root@server1 OPatch]# export PATH=$PATH:/u01/oracle/TestRAC/product/12.1.0.2/OPatch

[root@server1 OPatch]# ./opatchauto apply /staging/april202_patch/30805421 -analyze -oh /u01/oracle/TestRAC/product/12.1.0.2


OPatchauto session is initiated at Mon Aug 17 14:32:23 2020


System initialization log file is /u01/oracle/TestRAC/product/12.1.0.2/cfgtoollogs/opatchautodb/systemconfig2020-08-17_02-32-31PM.log.


Session log file is /u01/oracle/TestRAC/product/12.1.0.2/cfgtoollogs/opatchauto/opatchauto2020-08-17_02-32-51PM.log

The id for this session is 3KBP

Executing OPatch prereq operations to verify patch applicability on home /u01/oracle/TestRAC/product/12.1.0.2

Patch applicability verified successfully on home /u01/oracle/TestRAC/product/12.1.0.2

Verifying SQL patch applicability on home /u01/oracle/TestRAC/product/12.1.0.2

SQL patch applicability verified successfully on home /u01/oracle/TestRAC/product/12.1.0.2

OPatchAuto successful.


--------------------------------Summary--------------------------------


Analysis for applying patches has completed successfully:

Host:server1


RAC Home:/u01/oracle/TestRAC/product/12.1.0.2

Version:12.1.0.2.0


==Following patches were SKIPPED:


Patch: /staging/april202_patch/30805421/26983807


Reason: This patch is not applicable to this specified target type - "rac_database"


Patch: /staging/april202_patch/30805421/30703511

Reason: This patch is not applicable to this specified target type - "rac_database"


==Following patches were SUCCESSFULLY analyzed to be applied:

Patch: /staging/april202_patch/30805421/30700212

Log: /u01/oracle/TestRAC/product/12.1.0.2/cfgtoollogs/opatchauto/core/opatch/opatch2020-08-17_14-33-16PM_1.log

Patch: /staging/april202_patch/30805421/30703508


Log: /u01/oracle/TestRAC/product/12.1.0.2/cfgtoollogs/opatchauto/core/opatch/opatch2020-08-17_14-33-16PM_1.log


OPatchauto session completed at Mon Aug 17 14:33:59 2020

Time taken to complete the session 1 minute, 37 seconds

[root@server1 OPatch]#

srvctl start instance -d TestRAC -i TestRAC2

[root@server1 OPatch]# ./opatchauto apply /staging/april202_patch/30805421 -oh /u01/oracle/TestRAC/product/12.1.0.2

OPatchauto session is initiated at Mon Aug 17 14:53:18 2020


System initialization log file is /u01/oracle/TestRAC/product/12.1.0.2/cfgtoollogs/opatchautodb/systemconfig2020-08-17_02-53-26PM.log.


Session log file is /u01/oracle/TestRAC/product/12.1.0.2/cfgtoollogs/opatchauto/opatchauto2020-08-17_02-53-47PM.log

The id for this session is 4TCY


Executing OPatch prereq operations to verify patch applicability on home /u01/oracle/TestRAC/product/12.1.0.2

Patch applicability verified successfully on home /u01/oracle/TestRAC/product/12.1.0.2

Verifying SQL patch applicability on home /u01/oracle/TestRAC/product/12.1.0.2

SQL patch applicability verified successfully on home /u01/oracle/TestRAC/product/12.1.0.2

Preparing to bring down database service on home /u01/oracle/TestRAC/product/12.1.0.2

Successfully prepared home /u01/oracle/TestRAC/product/12.1.0.2 to bring down database service

Bringing down database service on home /u01/oracle/TestRAC/product/12.1.0.2

Following database(s) and/or service(s) are stopped and will be restarted later during the session: TestRAC


Database service successfully brought down on home /u01/oracle/TestRAC/product/12.1.0.2

Performing prepatch operation on home /u01/oracle/TestRAC/product/12.1.0.2

Perpatch operation completed successfully on home /u01/oracle/TestRAC/product/12.1.0.2

Start applying binary patch on home /u01/oracle/TestRAC/product/12.1.0.2


Binary patch applied successfully on home /u01/oracle/TestRAC/product/12.1.0.2


Performing postpatch operation on home /u01/oracle/TestRAC/product/12.1.0.2

Postpatch operation completed successfully on home /u01/oracle/TestRAC/product/12.1.0.2

Starting database service on home /u01/oracle/TestRAC/product/12.1.0.2

Database service successfully started on home /u01/oracle/TestRAC/product/12.1.0.2

Preparing home /u01/oracle/TestRAC/product/12.1.0.2 after database service restarted


No step execution required.........


Trying to apply SQL patch on home /u01/oracle/TestRAC/product/12.1.0.2


SQL patch applied successfully on home /u01/oracle/TestRAC/product/12.1.0.2


OPatchAuto successful.


--------------------------------Summary--------------------------------


Patching is completed successfully. Please find the summary as follows:


Host:server1


RAC Home:/u01/oracle/TestRAC/product/12.1.0.2


Version:12.1.0.2.0


Summary:


==Following patches were SKIPPED:


Patch: /staging/april202_patch/30805421/26983807


Reason: This patch is not applicable to this specified target type - "rac_database"


Patch: /staging/april202_patch/30805421/30703511


Reason: This patch is not applicable to this specified target type - "rac_database"

==Following patches were SUCCESSFULLY applied:


Patch: /staging/april202_patch/30805421/30700212


Log: /u01/oracle/TestRAC/product/12.1.0.2/cfgtoollogs/opatchauto/core/opatch/opatch2020-08-17_15-07-02PM_1.log

Patch: /staging/april202_patch/30805421/30703508

Log: /u01/oracle/TestRAC/product/12.1.0.2/cfgtoollogs/opatchauto/core/opatch/opatch2020-08-17_15-07-02PM_1.log


OPatchauto session completed at Mon Aug 17 15:22:30 2020

Time taken to complete the session 29 minutes, 13 seconds

[root@server1 OPatch]#

[root@server1 OPatch]#


[TestRAC@server1 april202_patch]$ id


uid=500(TestRAC) gid=200(dba) groups=200(dba),212(asmadmin),213(asmdba)


[TestRAC@server1 april202_patch]$ pwd


/staging/april202_patch


[TestRAC@server1 april202_patch]$ . oraenv


ORACLE_SID = [TestRAC2] ? TestRAC


The Oracle base remains unchanged with value /u01/oracle/TestRAC


[TestRAC@server1 april202_patch]$ export ORACLE_SID=TestRAC2


[TestRAC@server1 april202_patch]$


[TestRAC@server1 april202_patch]$ export PATH=$PATH:/u01/oracle/TestRAC/product/12.1.0.2/OPatch


[TestRAC@server1 april202_patch]$ id


uid=500(TestRAC) gid=200(dba) groups=200(dba),212(asmadmin),213(asmdba)


[TestRAC@server1 april202_patch]$ pwd


/staging/april202_patch


[TestRAC@server1 april202_patch]$ ls -ltr


total 3481548


drwxrwxr-x 4 TestRAC dba 4096 May 1 2018 20243268


drwxr-xr-x 7 oracrs dba 4096 Mar 31 11:29 30805421


-rwxr-xr-x 1 oracrs dba 3445855633 Aug 13 10:38 p30805421_121020_Linux-x86-64.zip


-rwxr-xr-x 1 oracrs dba 118408624 Aug 13 10:38 p6880880_122010_Linux-x86-64.zip


-rwxr-xr-x 1 TestRAC dba 821509 Aug 17 15:32 p20243268_12102180417_Linux-x86-64.zip


[TestRAC@server1 april202_patch]$ cd 20243268


[TestRAC@server1 20243268]$ ./opatch version


-bash: ./opatch: No such file or directory


[TestRAC@server1 20243268]$ opatch version


OPatch Version: 12.2.0.1.21

OPatch succeeded.


[TestRAC@server1 20243268]$ opatch apply


Oracle Interim Patch Installer version 12.2.0.1.21


Copyright (c) 2020, Oracle Corporation. All rights reserved.



Oracle Home : /u01/oracle/TestRAC/product/12.1.0.2


Central Inventory : /u02/oracle/oraInventory


from : /u01/oracle/TestRAC/product/12.1.0.2/oraInst.loc


OPatch version : 12.2.0.1.21


OUI version : 12.1.0.2.0


Log file location : /u01/oracle/TestRAC/product/12.1.0.2/cfgtoollogs/opatch/opatch2020-08-17_15-40-52PM_1.log


Verifying environment and performing prerequisite checks...


OPatch continues with these patches: 20243268


Do you want to proceed? [y|n]


y


User Responded with: Y


All checks passed.


Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.


(Oracle Home = '/u01/oracle/TestRAC/product/12.1.0.2')


Is the local system ready for patching? [y|n]


y


User Responded with: Y


Backing up files...


Applying interim patch '20243268' to OH '/u01/oracle/TestRAC/product/12.1.0.2'




Patching component oracle.rdbms, 12.1.0.2.0...


Patch 20243268 successfully applied.


Log file location: /u01/oracle/TestRAC/product/12.1.0.2/cfgtoollogs/opatch/opatch2020-08-17_15-40-52PM_1.log


OPatch succeeded.


[TestRAC@server1 20243268]$


verify applied patch


oracrs@server1OPatch]$


[oracrs@server1OPatch]$ ./opatch lspatches


30703511;ACFS PATCH SET UPDATE 12.1.0.2.200414 (30703511)


30703508;OCW PATCH SET UPDATE 12.1.0.2.200414 (30703508)


30700212;Database Patch Set Update : 12.1.0.2.200414 (30700212)


26983807;WLM Patch Set Update: 12.1.0.2.180116 (26983807)


OPatch succeeded.


[oracrs@server1OPatch]$

[testDB@server1OPatch]$

[testDB@server1OPatch]$ ./opatch lspatches


20243268;


30703508;OCW PATCH SET UPDATE 12.1.0.2.200414 (30703508)


30700212;Database Patch Set Update : 12.1.0.2.200414 (30700212)

OPatch succeeded.


[testDB@server1OPatch]$

Tuesday, 11 August 2020

Undo Tablespace and its Monitoring and Troubleshooting

 


What is Undo Tablespace and its importance


Behavior of undo tablespace is different from Normal Tablspace, It store undo record mean old image of data, if we update record from 35 to 40 , it will hold 35 to maintain data consistency

in case same record accessed by other sessions or users, it will fetch old record until unless it is committed .


Therefore usage of undo tablespace cannot be consided based on usages , There is different type of segment inside undo tablespace ,expired ,unexpired ,active ,expired segment will be recycled and will be available for new transaction .


Oracle maintains information to nullify changes made to the database. Such information consists of records of the actions of transactions, collectively known as undo. Oracle uses the undo to do the following: 


-  Rollback an active transaction 


-  Recover a terminated transaction 


-  Provide read consistency 


-  Recovery from logical corruptions


UNDO_MANAGEMENT Initialization Parameter


UNDO_MANAGEMENT specifies which undo space management mode the system should use. When set to AUTO, the instance starts in automatic undo management mode. In manual undo management mode, undo space is allocated externally as rollback segments.


By default, this parameter is set to MANUAL. Set this parameter to AUTO to enable automatic undo management mode.


This is a static parameter and cannot be modified dynamically using alter system command.

So if you wish to switch between Rollback Segments and AUM, then you need to restart the instance.



UNDO_RETENTION Initialization Parameter


This parameter specifies (in seconds) the low threshold value of undo retention.


The UNDO_RETENTION parameter can only be honored if the current undo tablespace has enough space. If an active transaction requires undo space and the undo tablespace does not have available space, then the system starts reusing unexpired undo space. This action can potentially cause some queries to fail with a "snapshot too old" message.


The amount of time for which undo is retained for the Oracle Database for the current undo tablespace can be obtained by querying the TUNED_UNDORETENTION column of the V$UNDOSTAT dynamic performance view.



To check undo Tablespace usages

-------------------------------------


COL TABLESPACE_NAME FORMAT A40

col "Tablespace" for a50

col "USED_IN_MB" for 9999999

col "FREE_IN_MB" for 99999



select

a.TABLESPACE_NAME,

a.TOTAL_IN_MB,

to_char(nvl((b.used),0.00),'999990.99') USED_IN_MB,

to_char(nvl((c.free),0.00),'999990.99') FREE_IN_MB,

to_char(nvl(((b.used/a.TOTAL_IN_MB)*100),0.00),'99990.99') PCT_USED

from

(select TABLESPACE_NAME,

to_char(sum(bytes)/(1024*1024),'9999990.99') TOTAL_IN_MB

from sys.dba_data_files

group by TABLESPACE_NAME) a,

(select TABLESPACE_NAME,bytes/(1024*1024) used

from sys.SM$TS_USED) b,

(select TABLESPACE_NAME,bytes/(1024*1024) free

from sys.SM$TS_free) c

where a.TABLESPACE_NAME=b.TABLESPACE_NAME(+) and

a.TABLESPACE_NAME=c.tablespace_name(+) and c.tablespace_name='&TSPACE' ;



set num 5

COL file_name FORMAT A60

select file_name,bytes/1024/1024 mb ,AUTOEXTENSIBLE from dba_data_files where tablespace_name='&TSPAC' ;


select file_name,bytes/1024/1024 mb from dba_data_files where tablespace_name='&TSPAC' ;

select file_name from dba_data_files where tablespace_name='&TSPAC' ;



Transaction Status of the undo in the extent can be any of the following:



SQL> select distinct status ,sum(bytes/1024/1024),count(*) from dba_undo_extents group by status;


STATUS SUM(BYTES/1024/1024) COUNT(*)

--------- -------------------- ----------

EXPIRED 11192.3125 1016

UNEXPIRED 20382.25 1591

ACTIVE 192 3



ACTIVE - Undo Extent is Active, Used by a transaction.


EXPIRED - Undo Extent is expired (Exceeded the Undo Retention).


UNEXPIRED - Undo Extent will be required to honor UNDO_RETENTION.



To check undo datafile


select file_name,bytes/1024/1024 mb,AUTOEXTENSIBLE from dba_data_files where tablespace_name='&TSPAC'


To extend datafile to add additional space


alter database datafile '/ora/gnsisprd/001/oradata/gnsisprd/undo_05.dbf' resize 7000m





V$TRANSACTION lists the active transactions in the system.


(a)  The following columns together points to a transaction.  (ie) The combination of the following should give unique transaction id for that database.


XIDUSN - Undo segment number

XIDSLOT - NUMBER Slot number

XIDSQN - NUMBER Sequence number


(b)  The following columns explains the number of undo blocks / undo records used per transaction.

USED_UBLK - Number of undo blocks used

USED_UREC - Number of undo records used


In the case of transaction rollback, the above columns will give estimation about the number of undo blocks that needs to be rolled back.


The number of undo records and undo blocks (USED_UREC and USED_UBLK) decrease while the transaction is rolling back. When they reach 0, the transaction disappears from v$transaction.


The following query can be used to monitor the transaction rollback.



SQL> select a.sid,a.username,b.xidusn,b.used_urec,b.used_ublk from v$session a, v$transaction b where a.saddr=b.ses_addr;


SID USERNAME XIDUSN USED_UREC USED_UBLK

---------- ------------------------------ ---------- ---------- ----------

87 OPS$ETRD359 12 1 1

2354 OPS$LMER046 258 1 1

1137 OPS$TPRD001 11 1 1

1807 G2_PROD_SUPPORT 14 1 1




SQL> select xidusn,ses_addr,status from v$transaction;


XIDUSN SES_ADDR STATUS

---------- ---------------- ----------------

12 070001024A88A4F0 ACTIVE

14 0700010252B7F8A0 ACTIVE




To understand state of the extents, space-used in the current undo tablespace.



select distinct status,sum(bytes/1024/1024),count(*) from dba_undo_extents group by status;


You need to find out what user and which SQL statement is eating up all the UNDO space.



SQL> select s.sql_text from v$sql s, v$undostat u where u.maxqueryid=s.sql_id;


You can also use following SQL to find out most undo used by a session for a currently executing transaction.


SQL> select s.sid,s.username,t.used_urec,t.used_ublk

from v$session s, v$transaction t

where s.saddr = t.ses_addr

order by t.used_ublk desc;


To find out which session is currently using the most UNDO,


SQL>select s.sid, t.name, s.value

from v$sesstat s, v$statname t

where s.statistic#=t.statistic#

and t.name='undo change vector size'

order by s.value desc;



SQL>select sql.sql_text, t.used_urec records, t.used_ublk blocks,

(t.used_ublk*8192/1024) kb from v$transaction t,

v$session s, v$sql sql

where t.addr=s.taddr

and s.sql_id = sql.sql_id

and s.username ='&USERNAME';




select

s.sid,

s.serial#,

s.username,

u.segment_name,

count(u.extent_id) "Extent Count",

t.used_ublk,

t.used_urec,

s.program

from

v$session s,

v$transaction t,

dba_undo_extents u

where

s.taddr = t.addr

and

u.segment_name like '_SYSSMU'||t.xidusn||'_%$'

and

u.status = 'ACTIVE'

group by

s.sid,

s.serial#,

s.username,

u.segment_name,

t.used_ublk,

t.used_urec,

s.program

order by

t.used_ublk desc,

t.used_urec desc,

s.sid,

s.serial#,

s.username,

s.program;


Here is another version of this query:


select

s.username,

s.sid,

s.serial#,

s.logon_time,

t.xidusn, t.ubafil,

t.ubablk,

t.used_ublk,

t.start_date, t.status

from

v$session s,

v$transaction t

where

s.saddr = t.ses_addr;


This query will also display UNDO for a user session:


select

s.sid,

s.username,

r.name "RBS name",

t.start_time,

t.used_ublk "Undo blocks",

t.used_urec "Undo recs"

from

v$session s,

v$transaction t,

v$rollname r

where

t.addr = s.taddr

and

r.usn = t.xidusn;


Here is a script to display UNDO changes by a session:


select

b.sid,

b.value,

a.name

from

v$statname a,

v$sesstat b

where

a.statistic# = b.statistic#

and

lower(a.name) = lower ('undo change vector size')

and

b.sid = nnn;


Here is yet another script to display UNDO by session:


set lines 200;


select

TO_CHAR(s.sid)||','||TO_CHAR(s.serial#) sid_serial,

NVL(s.username, 'None') orauser,

s.program,

r.name undoseg,

t.used_ublk * TO_NUMBER(x.value)/1024||'K' "Undo"

from

sys.v_$rollname r,

sys.v_$session s,

sys.v_$transaction t,

sys.v_$parameter x

where

s.taddr = t.addr

AND

r.usn = t.xidusn(+)

AND

x.name = 'db_block_size';





SQL> select sid,serial#,machine,module,username,SQL_ID,to_char(logon_time,'dd-mon:hh24:mi') from v$session where sid= 255;


SID SERIAL# MACHINE MODULE USERNAME SQL_ID TO_CHAR(LOGO

----- ------- ---------------------------------------------------------------- ---------------------------------------------------------------- --------------- ------------- ------------

255 28547 PC73183DD SQL Developer G2_PROD_SUPPORT 87ju09164fhvk 10-aug:09:53


SQL>


SELECT a.sql_text FROM v$sqlarea a WHERE a.sql_id = '87ju09164fhvk' ;

Thursday, 6 August 2020

DataPump EXPDP and IMPDP Monitoring and Troubleshooting

To run Expdp and impdp command in background of operating system.


1)

create directory DCMCNVP6_SCHEMA_EXPORT as 'H:\Migration_Export\';


grant read , write on DCMCNVP6_SCHEMA_EXPORT to public;


2) expdp "'/ as sysdba'" DIRECTORY=DCMCNVP6_SCHEMA_EXPORT dumpfile=DCMCNVP6_schema_06_Mar_2012.dmp logfile=DCMCNVP6_schema_06_Mar_2012_EXP.log SCHEMAS=MCN_EDM_RPT,mcneil_edm_prd


or


1) Import the Test1 schema1 into Test1schema2 schema


note : dumpfile parameter will be changed accordingly once expdp is completed.


dumpfile = expdp_P1DB0059_Test1schema1_AUD_%U.dmp

logfile = impdp_S1DB0059_P1DB0059_Test1schema1_AUD.log

directory = dpump_dir

version=11.1.0.7

parallel = 4

remap_schema=Test1schema1_AUD:Test1schema1_A



1. Extract the expdp.zip file into the /u022/backuploc/exp on TestServer server

2. set ORACLE_SID to backuploc, ORACLE_HOME to /u01/app/oracle/product/10.2.0.4_2

3. $cd /u022/backuploc/exp

4. $expdp system/<pwd> parfile=expdp_1.par

5. $expdp system/<pwd> parfile=expdp_2.par

6. $expdp system/<pwd> parfile=expdp_3.par

7. $expdp system/<pwd> parfile=expdp_4.par

8. Copy the 4 dump files to a location in TestServer2 server ( /u014/backuploc/exp )

set ORACLE_SID to backuploc(on server TestServer2), ORACLE_HOME, PATH = $PATH:$ORACLE_HOME/bin appropriately

9. Extract the impdp.zip file into a location ( /u014/backuploc/exp ) in TestServer2 server

10. cd /u014/backuploc/exp

11. Login as sys



12. $impdp system/<pwd> parfile=impdp_1.par


13. $impdp system/<pwd> parfile=impdp_2.par

14. $impdp system/<pwd> parfile=impdp_3.par

15. $impdp system/<pwd> parfile=impdp_4.par


nohup impdp system/<pwd> parfile=impdp_2.par &


or


[oracle@TestServer ~]$ nohup impdp \"/ as sysdba\" directory=PUMP dumpfile=FULL_EXPORT.dmp logfile=FULL_IMPORT.log &

[1] 9310

[oracle@TestServer ~]$ nohup: ignoring input and appending output to `nohup.out'


or


imp pain FROMUSER=WMC TOUSER=WMC FILE=td46562_dev_exp.dmp log=dev_td46562_imp.log feedback=10000 ignore=Y


create or replace directory dpump_refresh as '/oraback/backuploc/export/db_loc;

grant read,write on directory dpump_refresh to public;



nohup expdp \'/ as sysdba \' directory=DUMP_SN_NEW schemas=testschema dumpfile=STDB0001_25OCT2012.dmp logfile=STDB0001_25OCT2012.log &


expdp system directory=EXP_DP2 dumpfile=test_23feb_1.dmp,test_23feb_2.dmp logfile =test_23feb.log schemas =eeevs01 job_name =test_23feb_12_new3



Import has started with nohup and we check import process from database.


To Monitor Datapump Jobs


SET LINESIZE 150


COLUMN owner_name FORMAT A20

COLUMN job_name FORMAT A30

COLUMN operation FORMAT A10

COLUMN job_mode FORMAT A10

COLUMN state FORMAT A12


SELECT owner_name,

job_name,

TRIM(operation) AS operation,

TRIM(job_mode) AS job_mode,

state,

degree,

attached_sessions,

datapump_sessions

FROM dba_datapump_jobs

ORDER BY 1, 2;



OWNER_NAME JOB_NAME OPERATION JOB_MODE STATE DEGREE ATTACHED_SESSIONS DATAPUMP_SESSIONS

-------------------- ------------------------------ ---------- ---------- ------------ ---------- ----------------- -----------------

SYSTEM SYS_IMPORT_01 IMPORT TABLE EXECUTING 1 0 2

SYSTEM SYS_IMPORT_02 IMPORT TABLE EXECUTING 1 0 2

SYSTEM SYS_IMPORT_03 IMPORT TABLE EXECUTING 1 0 2

SYSTEM SYS_IMPORT_04 IMPORT TABLE NOT RUNNING 0 0 0

SYSTEM SYS_IMPORT_05 IMPORT TABLE NOT RUNNING 0 0 0




SQL> SELECT * FROM DBA_DATAPUMP_SESSIONS;


OWNER_NAME JOB_NAME INST_ID SADDR SESSION_TYPE

-------------------- ------------------------------ ---------- ---------------- --------------

SYSTEM SYS_IMPORT_02 1 07000100B4B7AC88 MASTER

SYSTEM SYS_IMPORT_06 1 07000100B4A6C6B8 MASTER

SYSTEM SYS_IMPORT_03 1 07000100B4EF2AE0 MASTER

SYSTEM SYS_IMPORT_07 1 07000100B54A6068 DBMS_DATAPUMP

SYSTEM SYS_IMPORT_03 1 07000100ACDA58F0 WORKER

SYSTEM SYS_IMPORT_02 1 07000100AC807B80 WORKER

SYSTEM SYS_IMPORT_07 1 07000100ACFF05E8 MASTER

SYSTEM SYS_IMPORT_01 1 07000100B4B84000 MASTER

SYSTEM SYS_IMPORT_01 1 07000100B4C9EA70 WORKER

SYSTEM SYS_IMPORT_06 1 07000100B4DD4F48 WORKER

SYSTEM SYS_IMPORT_07 1 07000100B54A9190 WORKER



To datapump job


There are several method to kill the running datapump job in oracle.


1. When the datapump (expdp or impdp) job is still running :-


execute control+c , it will show you export or import prompt.


Export>


then execute kill_job on the prompt it will ask for the conformation to kill the job or not.

example :-


Export >kill_job

Are you sure you wish to stop this job ([yes]/no): yes


2. If by mistake you are out of import or export prompt then


a) Get the name of the job using


SET LINESIZE 150


COLUMN owner_name FORMAT A20

COLUMN job_name FORMAT A30

COLUMN operation FORMAT A10

COLUMN job_mode FORMAT A10

COLUMN state FORMAT A12


SELECT owner_name,

job_name,

TRIM(operation) AS operation,

TRIM(job_mode) AS job_mode,

state,

degree,

attached_sessions,

datapump_sessions

FROM dba_datapump_jobs

ORDER BY 1, 2;




b) Open a new command prompt window. If you want to kill your import job type


impdp username/password@database attach=name_of_the_job (Get the name_of_the_job using the above query)


eg:


TestServer:TestDB:/home/oracle $impdp SYSTEM/abcTestDB@TestDB attach=SYS_IMPORT_TABLE_01


Import: Release 19.0.0.0.0 - Production on Thu Aug 6 13:25:23 2020

Version 19.7.0.0.0


Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.


Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production


Job: SYS_IMPORT_TABLE_01

Owner: SYSTEM

Operation: IMPORT

Creator Privs: TRUE

GUID: AC3482F9D4C503C4E0530AA02037A11B

Start Time: Thursday, 06 August, 2020 12:08:44

Mode: TABLE

Remote link: TargetDB.world

Instance: TestDB

Max Parallelism: 1

Timezone: +00:00

Export timezone: +01:00

Timezone version: 32

Export timezone version: 14

Endianness: BIG

NLS character set: WE8ISO8859P1

NLS NCHAR character set: AL16UTF16

EXPORT Job Parameters:

IMPORT Job Parameters:

Parameter Name Parameter Value:

CLIENT_COMMAND system/********@TestDB parfile=/exports/248.par

INCLUDE_METADATA 0

State: EXECUTING

Bytes Processed: 0

Current Parallelism: 1

Job Error Count: 0

Job heartbeat: 78


Worker 1 Status:

Instance ID: 1

Instance name: TestDB

Host name: TestServer

Object start time: Thursday, 06 August, 2020 12:08:46

Object status at: Thursday, 06 August, 2020 12:08:47

Process Name: DW05

State: EXECUTING

Object Schema: TEST_DBA

Object Type: TABLE_EXPORT/TABLE/TABLE_DATA

Completed Objects: 1

Total Objects: 1

Worker Parallelism: 1


Import> exit




c) Once you are attached to job, Type Kill_Job


Ex: Import>kill_job

Are you sure you wish to stop this job (y/n): y

And your job is killed, it will no longer show in dba_datapump_jobs





To check Session Detail



SELECT OPNAME,

SID,

SERIAL#,

CONTEXT,

SOFAR,

TOTALWORK,

ROUND(SOFAR/TOTALWORK*100,2) "%_COMPLETE"

FROM

V$SESSION_LONGOPS

WHERE

OPNAME in (

select

d.job_name from v$session s,v$process p,

dba_datapump_sessions d where

p.addr=s.paddr

and

s.saddr=d.saddr)

AND OPNAME NOT LIKE '%aggregate%' AND TOTALWORK != 0 AND SOFAR <> TOTALWORK;




SQL> select sid, serial#, context, sofar, totalwork,round(sofar/totalwork*100,2) "%_complete" from v$session_longops where sofar <> totalwork;


SID SERIAL# CONTEXT SOFAR TOTALWORK %_complete

---------- ---------- ---------- ---------- ---------- ----------

573 49339 0 0 2031 0

1046 39171 1236752 116 349 33.24

1333 48709 0 0 319 0


SQL>




SQL> SET lines 150

COL owner_name FORMAT a10

COL job_name FORMAT a20

COL operation FORMAT a10


SELECT owner_name, job_name, operation

FROM dba_datapump_jobs where state='NOT RUNNING' and attached_sessions=0;SQL> SQL> SQL> SQL> SQL> 2


OWNER_NAME JOB_NAME OPERATION

---------- -------------------- ----------

SYSTEM SYS_IMPORT_TABLE_04 IMPORT





To Monitor impdp status

--------------------------



SELECT

SUBSTR(sql_text, INSTR(sql_text,'INTO "'),30) table_name

, rows_processed

, ROUND( (sysdate-TO_DATE(first_load_time,'yyyy-mm-dd hh24:mi:ss'))*24*60,1) minutes

, TRUNC(rows_processed/((sysdate-to_date(first_load_time,'yyyy-mm-dd hh24:mi:ss'))*24*60)) rows_per_minute

FROM sys.v_$sqlarea

WHERE

sql_text like 'INSERT %INTO "%'

AND command_type = 2

AND open_versions > 0;


select name, sql_text, error_msg from dba_resumable;


SET lines 140
COL owner_name FORMAT a10;
COL job_name FORMAT a20
COL state FORMAT a12
COL operation LIKE owner_name
COL job_mode LIKE owner_name
SELECT owner_name, job_name, operation, job_mode,
state, attached_sessions
FROM dba_datapump_jobs;



SELECT 
OPNAME, 
SID, 
SERIAL#, 
CONTEXT, 
SOFAR, 
TOTALWORK,
    ROUND(SOFAR/TOTALWORK*100,2) "%_COMPLETE"
FROM 
V$SESSION_LONGOPS
WHERE 
OPNAME in
(
select 
d.job_name
from 
v$session s, 
v$process p, 
dba_datapump_sessions d
where 
p.addr=s.paddr 
and 
s.saddr=d.saddr
)
AND 
OPNAME NOT LIKE '%aggregate%'
AND 
TOTALWORK != 0
AND 
SOFAR <> TOTALWORK;


Monitor undo tablespace during impdp


COL TABLESPACE_NAME FORMAT A40

col "Tablespace" for a50

col "USED_IN_MB" for 9999999

col "FREE_IN_MB" for 99999



select

a.TABLESPACE_NAME,

a.TOTAL_IN_MB,

to_char(nvl((b.used),0.00),'999990.99') USED_IN_MB,

to_char(nvl((c.free),0.00),'999990.99') FREE_IN_MB,

to_char(nvl(((b.used/a.TOTAL_IN_MB)*100),0.00),'99990.99') PCT_USED

from

(select TABLESPACE_NAME,

to_char(sum(bytes)/(1024*1024),'9999990.99') TOTAL_IN_MB

from sys.dba_data_files

group by TABLESPACE_NAME) a,

(select TABLESPACE_NAME,bytes/(1024*1024) used

from sys.SM$TS_USED) b,

(select TABLESPACE_NAME,bytes/(1024*1024) free

from sys.SM$TS_free) c

where a.TABLESPACE_NAME=b.TABLESPACE_NAME(+) and

a.TABLESPACE_NAME=c.tablespace_name(+)


alter database datafile '/ora/intl2prd/001/oradata/intl2prd/undo_01.dbf' resize 13000m;


How can I cleanup old (orphaned) datapump jobs in DBA_DATAPUMP_JOBS ?


Cause: In many cases you have stop Oracle data pump jobs, shutdown database during export/import or use undocumented parameter KEEP_MASTER=Y. In these cases the master table remains in the database and it’s better to delete them.


Error often raised by expdp:


ORA-31626: job does not exist

ORA-31633: unable to create master table « SYS.SYS_EXPORT_FULL_01 »

ORA-06512: at « SYS.DBMS_SYS_ERROR », line 95

ORA-06512: at « SYS.KUPV$FT », line 1163

ORA-00955: name is already used by an existing object

ORA-06512: at « SYS.KUPV$FT », line 1056

ORA-06512: at « SYS.KUPV$FT », line 1044



Below is a step-by-step instruction on how to resolve it.

Step 1. Determine in SQL*Plus if Data Pump jobs exist in the dictionary


Identify these jobs and ensure that the listed jobs in dba_datapump_jobs are not export/import Data Pump jobs that are active: status should be ‘NOT RUNNING’ and not attached to a session:


SET lines 150

COL owner_name FORMAT a10

COL job_name FORMAT a20

COL operation FORMAT a10


SELECT owner_name, job_name, operation

FROM dba_datapump_jobs where state='NOT RUNNING' and attached_sessions=0;


Step 2: Drop the master tables


set head off

SELECT 'drop table ' || owner_name || '.' || job_name || ';'

FROM dba_datapump_jobs WHERE state='NOT RUNNING' and attached_sessions=0;


Execute the generated script.

Step 3: Identify orphan DataPump external tables


Check and drop external tables created for datapump jobs with select object_name, created from dba_objects where object_name like ‘ET$%’

Step 4: Purge recycle bin


If using recycling bin:

SELECT ‘purge table ‘ || owner_name || ‘.’ || ‘ »‘ || job_name || ‘ »;’

FROM dba_datapump_jobs WHERE state=’NOT RUNNING’ and attached_sessions=0;

Step 5: Confirm that the job has been removed


Run sql statement from step 1.