Oracle 12c GoldenGate Troubleshooting and Tuning
As database professionals, one of the most difficult aspects of working with Oracle GoldenGate is how to troubleshoot and resolve errors that occur throughout the lifecycle of the implementation. Due to the nature of Oracle GoldenGate functionality and operations as enterprise software, it involves all layers of the technology stack from source to target database to network issues and more.
Summary for Troubleshooting
Common Log files and commands which need to be used/checked for troubleshooting
TO
CHECK PROCESS
GGSCI
(Server1) 2> info
all
Program Status
Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT ABENDED TEST_CAP 00:00:03 03:48:38
EXTRACT RUNNING TEST_PUMP 00:00:00 00:00:08
• The ggserr.log file (known as the error log)
1) A history of GGSCI commands
2) Processes that started and stopped
3) Errors that occurred
4) Informational messages
you can check it using a standard text editor or a shell command
and The GGSCI command View GGSEVT
$ more ggserr.log | grep ERROR
View GGSEvt: Contents of the Oracle GoldenGate error log
GGSCI>View GGSEvt
The logs name is ggserr.log and it is located in the root Oracle GoldenGate directory.
You can also locate the file by using the Info Extract <group>, Detail command.
The location of the ggserr.logfile is listed with the other Oracle GoldenGate working directories:
GGSCI> Info Extract oraext, Detail
EXTRACT ORAEXT Last Started 2014-03-28 10:45 Status STOPPED
Checkpoint Lag 00:00:00 (updated 161:55:17 ago)
Log Read Checkpoint File u01/app/oracle/oradata/orcl/REDO03.LOG
2014-03-29 17:55:57 Seqno 34, RBA 104843776
<some contents deliberately omitted>
Current directory /home/oracle/OGG_T/
Report file /home/oracle/OGG_T/dirrpt\ORAEXT.rpt
Parameter file /home/oracle/OGG_T/dirprm\ORAEXT.prm
Checkpoint file /home/oracle/OGG_T/dirchk\ORAEXT.cpe
Process file / home/oracle/OGG_T/dirpcs\ORAEXT.pce
Error log /home/oracle/OGG_T/ggserr.log
GGSCI> Info replicat rep_4a, Detail
• Process reports (dirrpt/*.rpt)
Each Extract, Replicat, and Manager has their own process file. It provides initial clues
The default location is the dirrpt directory in the Oracle GoldenGate home location.
We can view Process reports using View Report <group> in GGSCI
GGSCI (SOURCE) > View Report ext_4a
GGSCI (SOURCE) > View Report rep_4a
• The discard file (dirrpt/*.dsc)
- It is created if the DiscardFile <file_name> parameter is used in the Extract or Replicat parameter file
-i s created when Extract or Replicat has a problem with the record that it is processing
-Contains column-level details for operations that the process cannot handle
-is usually used for Replicat to log operations that cannot be reconstructed or applied
-Can help you resolve data-mapping issues
In order to equip you with the best possible approach to solving issues that arise with Oracle GoldenGate, you look first into the most commonly found problems by using a holistic approach to troubleshooting Oracle GoldenGate environments. Because Oracle GoldenGate touches multiple areas including database, network, storage, and operating systems, the chapter in turn addresses these areas as related to Oracle GoldenGate problem resolution.
Common Problems
1) Process failures
2) Trail-file issues
3) Synchronization problems
Startup problems on source and target systems
Database configuration and availability issues
Oracle GoldenGate Process Failures
The first type of problem you may encounter within Oracle GoldenGate environments are process failures. These may occur on the source, on the target, or on both environments for one or more of the Oracle GoldenGate processes.
We looks first into process failures that occur during normal Oracle GoldenGate operations as well as methods to resolve these types of process failures. The preliminary task to perform if you find an Oracle GoldenGate process failure is to investigate the details by using the GoldenGate Software Command Interface (GGSCI) command INFO ALL:
A. Oracle GoldenGate Manager Process Failures
Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.0.0 Build 078
Windows (optimized), Oracle 11 on Jul 28 2010 17:20:29
Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserve
GGSCI (oracledba) 1> info all
Program Status Group Lag Time Since Chkpt
MANAGER STOPPED
In many cases, the Manager process fails when network issues occur on either the source or target environment for Oracle GoldenGate. You should check to make sure the Manager process is up and running. If the Manager process isn't running, then you need to restart it by executing the following command from within GGSCI:
GGSCI> START MANAGER
To verify that Manager has started successfully, issue the Oracle GoldenGate GGSCI command again after you start it. The following example makes sure Manager has started correctly and without errors:
GGSCI (oracledba) 7> start manager
Manager started.
GGSCI (oracledba) 8> info all
Program Status Group Lag Time Since Chkpt
MANAGER RUNNING
As one of the key Oracle GoldenGate processes, the Manager process abends or fails if network ports are blocked by a firewall. You can check by using the ping command to verify whether there is a network problem with ports. You should work closely with your network or system administrator to ensure that the ports allocated by the Manager process are open and available so that Manager can communicate with both the source and target systems for Oracle GoldenGate.
B.Oracle GoldenGate Extract Process Failures
Oracle GoldenGate Extract process failures can occur either on the source systems or on the intermediate systems that use a data pump Extract process. Because the Extract process communicates with the source database, Manager process, and target Replicat processes, if there is a failure with these Oracle GoldenGate processes, then Extract either hangs or abends and fail. When you're dealing with an Extract failure, you should first run the Oracle GoldenGate GGSCI VIEW REPORT command to examine the current status of your Oracle GoldenGate environment.
If the Extract process fails on the source system, run the following command:
GGSCI> VIEW REPORT <extract name>
Examine the output from the VIEW REPORT command in GGSCI; the details provide clues to further investigation for resolving the Extract failure. After you've resolved the issue, you need to restart the Extract process by executing the GGSCI START EXTRACT <extract name> command.
Issue the command with the DETAIL option, as shown here:
GGSCI> START EXTRACT <extract name>, DETAIL
Now, let's look with an example of an Extract process failing on an Oracle source database server. To view a report from GoldenGate, execute the command GGSCI to enter the GoldenGate command-line interface, and then enter VIEW REPORT EATAA, which references the name of the sample Extract process group:
GGSCI (oracledba) 3> view report eataa
***********************************************************************
Oracle GoldenGate Capture for Oracle
Version 11.1.1.0.0 Build 078
Windows (optimized), Oracle 11 on Jul 28 2010 18:00:34
Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.
Starting at 2011-03-14 00:00:32
***********************************************************************
Operating System Version:
Microsoft Windows Vista Business Edition, on x86
Version 6.0 (Build 6002: Service Pack 2)
Process id: 2524
Description:
***********************************************************************
** Running with the following parameters **
***********************************************************************
--Example Extract Parameter File
SOURCEISTABLE
2011-03-14 00:00:33 INFO OGG-01017 Wildcard resolution set to IMMEDIATE bec
ause SOURCEISTABLE is used.
USERID ggs, PASSWORD *****
Source Context :
SourceModule : [ggdb.ora.sess]
SourceID : [../gglib/ggdbora/ocisess.c]
SourceFunction : [OCISESS_try]
SourceLine : [498]
2011-03-14 00:00:33 ERROR OGG-00664 OCI Error during OCIServerAttach (status
= 12560-ORA-12560: TNS:protocol adapter error).
2011-03-14 00:00:33 ERROR OGG-01668 PROCESS ABENDING.
In this case, Extract fails to run due to an Oracle protocol network error on the source database server. The network error is caused by a failed Oracle network listener (TNS) failing to start correctly. You troubleshoot the error by examining the listener.ora and tnsnames.ora network configuration and correcting any errors. Then, restart both the Oracle listener and GoldenGate processes. By correcting the source Oracle listener protocol error, you can successfully restart the Extract process.
Another useful way to investigate Extract failure problems is to run the GGSCI command INFO <extract name>, DETAIL to obtain a listing of the current Extract file locations for additional analysis:
GGSCI (oracledba) 26> info extora, detail
EXTRACT EXTORA Initialized 2011-03-13 23:45 Status STOPPED
Checkpoint Lag 00:00:00 (updated 00:22:24 ago)
Log Read Checkpoint Oracle Redo Logs
2011-03-13 23:45:30 Seqno 0, RBA 0
Target Extract Trails:
Remote Trail Name Seqno RBA Max MB
AA 0 0 10
Extract Source Begin End
Not Available * Initialized * 2011-03-13 23:45
Current directory C:\ggs_src
Report file C:\ggs_src\dirrpt\EXTORA.rpt
Parameter file C:\ggs_src\dirprm\EXTORA.prm
Checkpoint file C:\ggs_src\dirchk\EXTORA.cpe
Process file C:\ggs_src\dirpcs\EXTORA.pce
Error log C:\ggs_src\ggserr.log
Next, let's look at more difficult situations that occur when an Oracle GoldenGate process fails without providing you with a report.
Oracle GoldenGate Process Failures Without Report Diagnostics
If an Oracle GoldenGate process fails before first writing out a report file to standard screen output, you can run it from the operating system's command shell (not GGSCI) to send process information to the terminal. To investigate, you can issue the startup by using the syntax <process> paramfile <path name>.prm at the operating-system level where <process> indicates either the Extract or Replicat process for Oracle GoldenGate and paramfile is the fully qualified file name of the parameter file for the Extract or Replicat. Here's an example:
C:\ggs_src>extract paramfile C:\ggs_src\dirprm\extora.prm
Version 6.0 (Build 6002: Service Pack 2)
Process id: 6056
Description:
2011-03-14 00:19:46 ERROR OGG-00664 OCI Error beginning session (status = 10
17-ORA-01017: invalid username/password; logon denied).
2011-03-14 00:19:46 ERROR OGG-01668 PROCESS ABENDING.
On UNIX and Linux platforms, you need to look for a core dump file located under the Oracle GoldenGate home directory where you've installed the Oracle GoldenGate software.
Oracle GoldenGate Trail File Problems
Recall from earlier chapters that Oracle GoldenGate process groups write and read data to special flat files called trail files on the source and target systems. Failure in processing these trail files halts correct operations of the Oracle GoldenGate software. This section looks at the following problems that can occur with trail files:
Trail files that don't empty
Trail files that don't roll over
Trail file purging issues
Trail Files that Don't Empty
Sometimes you encounter a problem with trail files that don't empty and remain full of data. You should check to identify whether the trail file in question is a local or remote trail file. If the problem is with a local trail file that fails to empty, check for TCP/IP network latency issues between the source, intermediate, and target systems. If the remote trail file on the target system fails to empty on a regular basis, check to see if your Replicat process groups are available, running, and processing data from the remote trail files.
If the Replicat processes are running correctly without errors, the cause may be large transactions being processed by the Replicat on the target system. In this case, you can split up the tables and partition them to break out the large transactions into multiple trail files with additional Replicat process groups to process the transactions in parallel. To do so, use the Oracle GoldenGate Replicat parameter FILTER option with the @RANGE function in the MAP statement for the Replicat.
Trail Files that Don't Roll Over
Another issue that plagues Oracle GoldenGate trail-file operations occurs when a trail file fails to roll over to a new trail file after it fills up completely. One thing to check here is the maximum file size for the trail file in question. You can check this using Oracle GoldenGate GGSCI command INFO EXTTRAIL * or INFO RMTTRAIL * as shown in the following example:
GGSCI (oracledba) 1> info exttrail *
Extract Trail: AA
Extract: EXTORA
Seqno: 0
RBA: 0
File Size: 10M
You can locate this information by examining the File Size field. To change the trail-file size with Oracle GoldenGate, issue the Oracle GoldenGate GGSCI command ALTER EXTTRAIL or ALTER RMTTRAIL with the MEGABYTES option, as shown here:
GGSCI (oracledba) 3> alter exttrail aa megabytes 20, extract extora
EXTTRAIL altered.
GGSCI (oracledba) 4> info exttrail *
Extract Trail: AA
Extract: EXTORA
Seqno: 0
RBA: 0
File Size: 20M
Trail File Purging Issues
Trail files sometimes aren't purged correctly, thus causing errors during processing of data between the source and target database environments. Probably the most common source of failed purging operations with trail files occurs due to failure to use PURGEOLDEXTRACTS to manage the trail files in use by Oracle GoldenGate. You should add this parameter to the Manager parameter file to prevent old trail files from accumulating too fast and thus causing problems with your Oracle GoldenGate configuration. Keep in mind that you must grant the Oracle GoldenGate Manager user account the correct level of access permissions to read from and write to the trail files and file systems when using this.
Another cause of trail-file purging issues is an obsolete Replicat group that may still be referencing the old trail file. Oracle GoldenGate trail files aren't purged if other processes still read them. If an old Replicat group is referencing a trail file, you use the DELETE REPLICAT command from GGSCI to delete the obsolete Replicat process group so that the checkpoint records are deleted, thereby allowing the Manager process to purge the trail files. Keep in mind that if a checkpoint table is still used for the Replicat group, you need to log into the target database with the DBLOGIN command before removing the checkpoint from the table and Replicat deletion. You can use the following syntax to perform this task:
DBLOGIN USERID <user>, PASSWORD <pw>
DELETE REPLICAT <group>
Trail Files that Purge Too Soon
Another painful situation with trail files occurs when the trail files are purged too soon. This causes errors due to missing data between the source and target systems. One root cause to check for to solve this problem is multiple Replicats and data pumps reading from the same trail file.
Also check to see if you're using the PURGEOLDEXTRACTS parameter details. The key thing is to realize that you should only use this parameter as a Manager parameter—you shouldn't use PURGEOLDEXTRACTS as an Extract or Replicat parameter. When you use PURGEOLDEXTRACTS as a Manager parameter, you can use the option USECHECKPOINTS to defer purging of the trail file until all of the Oracle GoldenGate processes have finished processing data for the trail file. In addition, you can use the MIN option to store the trail file for a set period of time. Whenever you use the MIN option, you must also set the MAX option for PURGEOLDEXTRACTS to denote the maximum amount of time to keep the trail file.
Oracle GoldenGate Error Log Analysis
Fortunately, Oracle GoldenGate is a well instrumented software product that provides a superb error log file that contains details for all aspects of replication operations on both source and target systems. If an error occurs during GoldenGate processing, it's imperative that you review the Oracle GoldenGate error logs located under the source and target base installation directories. On Linux and UNIX systems, you can use the tail command to examine the last few error messages contained in the error log file for Oracle GoldenGate.
Let's look at the error log file for the Oracle 11g GoldenGate and Oracle 11g Windows source system. The file is called ggserr.log; you can open it with a text editor such as vi or Emacs on Linux and UNIX or Notepad on Windows platforms. In addition, you can view the error log file with the GGSCI command VIEW GGSEVT as shown here:
Oracle: PROCESS ABENDING.
2011-03-14 00:19:46 INFO OGG-00992 Oracle GoldenGate Capture for Oracle, extora.prm: EXTRACT starting.
2011-03-14 00:19:46 INFO OGG-01017 Oracle GoldenGate Capture for Oracle, extora.prm: Wildcard resolution set to IMMEDIATE because SOURCEISTABLE is used.
2011-03-14 00:19:46 ERROR OGG-00664 Oracle GoldenGate Capture for Oracle, extora.prm: OCI Error beginning session (status = 1017-ORA-01017: invalid username/password; logon denied).
2011-03-14 00:19:46 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, extora.prm: PROCESS ABENDING.
Understanding the Oracle GoldenGate Discard File
Recall from earlier chapters that Oracle GoldenGate creates a discard file whenever the DISCARDFILE parameter is used by either the Extract or Replicat. If there is a problem during processing of data from the source or target environments by either the Replicat or Extract, then the rejected records are dumped into the discard file. You should review the contents of the discard file on both the source and target database environments on a regular basis as part of your due diligence to catch data issues encountered by Oracle GoldenGate.
The discard file contains all column-level details for database operations that Oracle GoldenGate processes couldn't process successfully. Each discard file contains the following useful details:
Database error message for the process, such as ORA-00100 for "No Data Found" with Oracle 11g
The trail file sequence number for record(s) that the Extract or Replicat attempted to be processed
The relative byte address of the record in the trail file
Details of the discarded record that the Extract or Replicat attempted to process
Typically, a discard file is used by the Replicat to log operations that couldn't be reconstructed or applied, but you may find one useful for the Extract as well. Check the discard error for messages such as ORA-1403 or duplicate and/or rejected records. Here is a sample Oracle GoldenGate discard file:
ORA-20017: repora 724935
ORA-06512: at "HR.REPORAINSERT", line 41
ORA-04088: error during execution of trigger 'HR.REPORA_INSERT'
Operation failed at seqno 15 rba 28483311
Problem replicating HR.EMP to HR.TRGTREP
Error occurred with insert record (target format)...
*
A_TIMESTAMP = 2011-03-09 11:28:11
OK = 1.0000
NOTOK = −1.0000
Discard records provide useful clues for issues that occur during processing so you can identify and solve the root cause quickly.
Discard File Not Created
Sometimes you may encounter the frustrating issue that Oracle GoldenGate fails to create a discard file. One common cause is that the discard file's location wasn't provided in the parameter file for use with the DISCARDFILE parameter. GoldenGate doesn't create a discard file by default, so you need to give a file name and location of the discard file—otherwise it won't be created.
A second cause of missing discard files occurs when the DISCARDFILE parameter doesn't reference the correct directory on the file system. You need to verify that sufficient read and write access permissions are granted for the discard file and the directory that contains the discard file. In addition, make sure the user who accesses the discard file has the required security permissions to access, read, and write to and from the file location.
Discard File Is Too Big
Keep in mind that Oracle GoldenGate constantly writes to the discard file as processing takes place. If the discard file isn't given sufficient disk space on the file system where it lives, then you may have errors with processing it. One recommendation is to adjust the size of the discard file by using the following parameters:
DISCARDROLLOVER: Specifies the parameters for aging of the discard file.
MAXDISCARDRECS: Limits the total number of errors written out to the discard file.
PURGE option for DISCARDFILE: Purges the discard file before writing out new information
MEGABYTES option for DISCARDFILE: Sets the maximum size of the discard file. The default size of the discard file is 1MB.
Can't Open the Discard File
Oracle GoldenGate processes for the Extract and Replicat use the discard file during operational activities. If there is an error in the location of the discard file, the Extract or Replicat process will fail in GoldenGate, as shown in the following error:
2011-03-16 00:54:08 ERROR OGG-01091 Unable to open file "c:\win11trgt\dirrpt\report.dsc" (error 3, The system cannot find the path specified.).
2011-03-16 00:54:08 ERROR OGG-01668 PROCESS ABENDING.
Check the read and write permissions of the DISCARDFILE parameter you've configured in the Extract and Replicat parameter file to ensure that the directory exists and has the correct permissions granted.
Using Trace Commands with Oracle GoldenGate
Oracle GoldenGate provides several useful parameters that you can use to trace processes that fail within the Oracle GoldenGate environment. This section shows you how to use these tracing tools to identify root cause of failures in Oracle GoldenGate. The following tracing tools are discussed along with an example that showcases how to run TRACE:
TLTRACE
TRACE
TRACE2
NOTE
If you require the use of tracing in Oracle GoldenGate, you should first contact Oracle GoldenGate support while investigating failures within the Oracle GoldenGate environment so you can best deploy these tracing commands without adversely affecting your configuration.
Oracle GoldenGate Process Tracing with TLTRACE
The TLTRACE parameter allows you to run a trace for database transaction log activity. You can run a trace to show either basic or detailed information about records being processed. To enable a TLTRACE session, you need to add this parameter to the Extract or Replicat parameter file and restart the process.
Using TRACE Parameters with Oracle GoldenGate
The TRACE and TRACE2 parameters allow you to grab all details regarding Extract and Replicat processing. The TRACE parameter provides detailed processing information for Oracle GoldenGate processes. In contrast, you use the TRACE2 parameter to identify database code segments where either the Extract or Replicat is taking the most time to process.
Troubleshooting Case Study with Oracle GoldenGate
Let's walk through a case study that shows how to use GGSCI commands and the error log to understand why a process failed in an Oracle database environment. First you use the STATUS EXTRACT command, which returns the following:
GGSCI (ggs_src) 20> status extract extora
EXTRACT EXTORA: ABENDED
Next, use the VIEW GGSEVT command to drill deeper into the issue with the failed Extract process:
GGSCI (ggs_src) 22> view ggsevt
2011-03-11 10:28:11 GGS INFO 399 GoldenGate Command Interpreter
for Oracle: GGSCI command (admin): start extract ggext.
2011-03-11 10:38:15 GGS INFO 301 GoldenGate Manager for Oracle,
mgr.prm: Command received from GGSCI on host oracledba(START
EXTRACT GGEXT).
2011-03-11 10:40:02 GGS INFO 310 GoldenGate Capture for Oracle,
ggext.prm: EXTRACT EXTORA starting.
2011-03-11 10:41:11 GGS ERROR 501 GoldenGate Capture for Oracle,
extora.prm: Extract read, error 13 (Permission denied) opening redo
log C:\oracle\arch\0001_000000568.arc for sequence
258.
2011-03-11 10:43:22 GGS ERROR 190 GoldenGate Capture for Oracle,
extora.prm: PROCESS ABENDING.
In this failure, error message 501 indicates that the Extract user doesn't have the correct permissions to read the redo logs on the source Oracle database. To fix this issue, you grant read and write permissions on the source Oracle database so the Extract process can retrieve the redo log files. After you've granted these read and write permissions to the Extract user, you need to stop the Manager process and exit GGSCI. Log out from the terminal session you have open, and restart the Oracle GoldenGate processes.
Oracle GoldenGate Configuration Issues
Oracle GoldenGate configuration issues pose challenges because once the environment has been installed and configured, if the original DBA has left the company, the new DBA team doesn't always have documentation from the past DBA to help them understand the configuration they've inherited. Configuration issues fall into the following areas:
Incorrect software version installed for Oracle GoldenGate
Database configuration issues on the source and/or target with Oracle GoldenGate
Oracle GoldenGate parameter file configuration issues
Operating system configuration issues
Network configuration issues
Incorrect Software Versions with Oracle GoldenGate
Installing the wrong version and platform release for Oracle GoldenGate causes failures within the source and target environments. Oracle provides a unique build for the GoldenGate software based on platform, database, and version (32-bit versus 64-bit) that requires the correct version to be installed for your platform. For instance, you can't install 64-bit Oracle GoldenGate Linux software on an IBM AIX platform, or you'll receive errors.
The build name contains the operating system version, database version, GoldenGate release number, and GoldenGate build number, as shown in the following example for Oracle on Solaris:
ggs_Solaris_sparc_ora10g_64bit_v11_1_1___78.tar
To find out the GoldenGate version, change to the GoldenGate home directory and issue the GGSCI –v command from the operating system terminal shell window:
C:\ggs_src>ggsci -v
Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.0.0 Build 078
Windows (optimized), Oracle 11 on Jul 28 2010 17:20:29
Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.
Database Availability Issues
If the source and target database environments aren't online and running correctly, Oracle GoldenGate processing fails because the Extract and Replicat processes frequently log in to the database environments to access redo log files for processing data-replication activities. You can check whether the database and listener are online by using the LSNRCTL and TNSPING commands for Oracle from an operating system shell window:
C:\ggs_src>lsnrctl status
LSNRCTL for 32-bit Windows: Version 11.2.0.1.0 - Production on 14-MAR-2011 01:53
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1521)))
STATUS of the LISTENER
------------------------
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(PIPENAME=\\.\pipe\EXTPROC1521ipc)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=oracledba)(PORT=1521)))
Services Summary...
Service "CLRExtProc" has 1 instance(s).
Instance "CLRExtProc", status UNKNOWN, has 1 handler(s) for this service...
Service "win11src" has 1 instance(s).
Instance "win11src", status READY, has 1 handler(s) for this service...
The command completed successfully
C:\ggs_src>tnsping win11src
TNS Ping Utility for 32-bit Windows: Version 11.2.0.1.0 - Production on 14-MAR-2
Used parameter files:
C:\winora11g2\product\11.2.0\dbhome_1\network\admin\sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = oracledb
a)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = win11src))
)OK (80 msec)
On quick way to verify that the database is online is to use the UNIX or Linux command ps –ef|grep smon or, even better, to log on to Oracle SQL*PLUS as the Oracle GoldenGate user for the Extract and Replicat, as shown here:
C:\ggs_src>sqlplus ggs/ggs@win11src
SQL*Plus: Release 11.2.0.1.0 Production on Mon Mar 14 01:59:23 2011
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> select count(*) from hr.employees;
COUNT(*)
----------
107
You also need to check to verify that the Manager process is online and running as well as perform the previous checks for the source and target databases. Without access to the database systems and the Manager process, the Extract and Replicat will fail to run successfully.
Missing Oracle GoldenGate Process Groups
One common issue that plagues Oracle GoldenGate environments is missing process groups. As a starting point in troubleshooting such issues, you should execute the GGSCI INFO ALL command to view all of the current processes and groups on the system. The Extract group name may have been spelled incorrectly either when the group was created or when the START command was issued for the Oracle GoldenGate process.
Missing Oracle GoldenGate Trail Files
Often, Oracle GoldenGate environments have Extract parameter files or obey files that create Extract process groups that reference non-existent or missing trail files. Without a valid trail file, an Extract can't write out data captured from the online redo log files in the source database. Furthermore, the Replicat process will abend and fail because it can't read from the remote trail file. Without trail files, the Extract can't write the initial checkpoint and the Replicat has no available data source to read. You can issue the GGSCI command INFO EXTRACT <group> or the INFO REPLICAT <group> command with the DETAIL option to verify whether the trail file exists. In addition, after you identify the trail file, you can use the GGSCI command INFO EXTTRAIL <trail file name> to drill down further:
GGSCI (oracledba) 3> info extract extora detail
EXTRACT EXTORA Initialized 2011-03-13 23:45 Status STOPPED
Checkpoint Lag 00:00:00 (updated 02:24:26 ago)
Log Read Checkpoint Oracle Redo Logs
2011-03-13 23:45:30 Seqno 0, RBA 0
Target Extract Trails:
Remote Trail Name Seqno RBA Max MB
AA 0 0 10
GGSCI (oracledba) 5> info exttrail aa
Extract Trail: AA
Extract: EXTORA
Seqno: 0
RBA: 0
File Size: 10M
Oracle GoldenGate Parameter File Configuration Issues
Oracle GoldenGate relies on parameter files to store and manage process operations for core functionality with the Extract for source data capture, the Manager process for network communication and interprocess operations, and the Replicat for target apply processing on the source system. If these parameter files aren't set up correctly or are unavailable, then Oracle GoldenGate processing will fail.
The first scenario involving parameter file configuration issues is to check whether the parameter file is located in the correct directory. By default, parameters are stored in the dirprm subdirectory after Oracle GoldenGate has been installed. You need to verify that the parameter files for the Extract and Replicat are in this directory and that these parameter files have the same name as the process group for Oracle GoldenGate. In the event that the parameter file is missing or not in this directory, perform a search to locate the correct parameter file. If you don't remember where you placed the configuration parameter file for the process group, you can use the GGSCI command INFO EXTRACT <group>, DETAIL to locate the correct file. If you want to store the parameter file in a different filesystem and directory, then you need to use the PARAMS argument with the ADD EXTRACT command in GGSCI when you add the new Extract; or, if you've already created the Extract group, you can use the GGSCI command ALTER EXTRACT to change the parameter file location.
Another issue that poses problem with parameter configurations is related to file access permissions. If the process group read and write permissions aren't granted correctly at either the operating system level or the database level, then failures will occur when the Extract or Replicat process attempts to access the database on either the source or target system. In addition, if the file permissions aren't set correctly, then errors will occur as well during Oracle GoldenGate operations. For Windows, you can check the permissions with the Windows Explorer graphical interface. On Linux and UNIX platforms, use the ls –l command to check the status for read, write, and execute permissions. You can use the chmod and chown UNIX and Linux commands to grant the required file system permissions to the Oracle GoldenGate parameter files as necessary.
A third configuration issue that occurs with Oracle GoldenGate process group parameter file configurations takes place when the required parameters are missing for either the Extract or Replicat. This is often the case when new users perform Oracle GoldenGate configuration without sufficient knowledge of the product. The following parameters are required for an Extract:
EXTRACT <group name>
USERID <ID>, PASSWORD <pw>
RMTHOST <hostname>, MGRPORT <port>
RMTTRAIL <trail name> | EXTTRAIL <trail name> |
RMTFILE <filename> | EXTFILE <filename>
TABLE <source table>;
A Replicat requires the following parameters to perform online change synchronization:
REPLICAT <group name>
SOURCEDEFS <file name> | ASSUMETARGETDEFS
USERID <ID>, PASSWORD <pw>
MAP <source table>, TARGET <target table>;
If the configuration parameters for the Extract and Replicat are listed in wrong order, then processing fails. Oracle GoldenGate parameters are processed in the exact order listed in a parameter file. Much like grammar rules in foreign languages, you must use the correct order and syntax for parameter files. Let's look at some key examples. The parameter RMTHOST must precede the RMTTRAIL parameter. The parameter TABLE or MAP must be listed after global and all the specific parameters that apply to it.
Syntax errors with parameter files are another source of mischief. Oracle GoldenGate reports syntax problems in the process report, and these often appear as "bad parameter" errors. One way to mitigate these syntax errors (which can be tricky to isolate in a parameter file) is to use the CHECKPARAMS parameter to verify syntax. Like a spell checker, CHECKPARAMS verifies the syntax whenever the Oracle GoldenGate process starts. It writes results to the report file and then stops the process.
NOTE
After using the CHECKPARAMS parameter and resolving syntax errors, remove the parameter from the process group parameter file. Otherwise, the process won't restart!
Some common syntax errors that occur in parameter files for Extracts and Replicats include the following:
The Extract TABLE parameter or Replicat MAP parameter isn't terminated with a semicolon.
Commas aren't followed by a space in parameter files.
Missing commas, quotes, or parentheses in nested clauses exist, such as a COLMAP.
Operating System Configuration Issues with Oracle GoldenGate
Oracle GoldenGate relies on the operating system platform to perform replication activities on a real-time basis. All the core processes for Oracle GoldenGate operate as system background processes—either as Linux/Windows daemon processes or as Windows services for Windows. If there is a hiccup at the operating system level, Oracle GoldenGate processes fail. Let's take a closer look at some common operating system issues that cause failures in Oracle GoldenGate and how you can resolve them.
First are configurations for the operating system that are missing system libraries. In the case of UNIX and Linux operating systems, if you see errors in Oracle GoldenGate that complain about a missing library, you should issue the env command from a shell window to check the settings for the LD_LIBRARY_PATH and PATH variables to ensure that they're correctly set. If these variables are incorrect, rectify the situation and set these values to the correct path in the .profile startup file in the Oracle GoldenGate home directory.
The error report may also display a message that the function stack needs to be increased. To increase the memory allocated for the stack, you can use the FUNCTIONSTACKSIZE parameter.
NOTE
Be careful with the FUNCTIONSTACKSIZE parameter, because it may adversely affect Oracle GoldenGate performance. Be sure to first test it in a nonproduction environment when you make changes.
File-access problems can also cause failures at the operating system level. Both Extract and Replicat user accounts at the operating system level required full read and write permissions for all files in the Oracle GoldenGate directory. If you receive the error message "Bad parameter:Group name invalid," it indicates that the process can't open the checkpoint file. You should execute the GGSCI command INFO * to view the group name and then issue the command VIEW PARAMS <group> to make sure the group name from the GGSCI output for the INFO * command matches the one in the EXTRACT or REPLICAT parameter.
Finally, check the values for the key operating system environment variables for the database installed on the source and target systems. For instance, with Oracle, make sure you check that the ORACLE_SID and ORACLE_HOME system variables are set to the correct instance name in the GoldenGate user profile. You can use the env|grep ORA command in Linux and UNIX to check that these Oracle environment variables are set to the correct values.
Network Configuration Issues with Oracle GoldenGate
Oracle GoldenGate relies heavily on network operations to ensure data transfer and communications between source, intermediary, and target systems as part of its real-time data-replication functionality. Latency issues and network outages adversely affect Oracle GoldenGate environments. As such, it's paramount that Oracle GoldenGate administrators develop a close partnership with the system administrator and network operations team to ensure maximum uptime and performance for the Oracle GoldenGate environment. Network problems with Oracle GoldenGate fall into the following categories:
Network access and connectivity
Network latency
Network availability and stability
Network data-transfer issues
Let's look into each of these key areas how to resolve issues. One of the first errors I experienced years ago when I was a newbie to Oracle GoldenGate was a connection that refused errors that I noticed in the Extract report file. Typically, whenever you receive a common TCP/IP error such as "4127 connection refused." it indicates that the target Manager or Server isn't running, or that the Extract process is pointing to the wrong TCP/IP address or Manager port number. The report for the Extract shows an error in GGSCI similar to this:
ERROR: sending message to EXTRACT EATAA (TCP/IP error: Connection reset).
You can use the GGSCI command INFO MGR to identify the port number in use by the target Manager.
Another item to check the Extract parameter RMTHOST to ensure that MGRPORT is using the same port number as shown in the GGSCI INFO MGR command. If you used a host name, be sure to check that the server's domain name server (DNS) can resolve it. If an IP address was used, make sure to verify that it's correct. You can issue the IFCONFIG command for Linux or UNIX and the Windows IPCONFIG command to verify an IP address from the OS command shell. In addition, you should test network connectivity between the source and target systems by using the PING <host name> command. You can display the network routing table by using the NETSTAT command to check for routing access between source and target.
Another common network error that occurs with Oracle GoldenGate appears when the Extract returns the error "No Dynamic ports available." This means the target Manager process was unable to obtain a port on which to communicate with the source Manager. The Manager process looks for a port in the list specified with the DYNAMICPORTLIST parameter. However, if the DYNAMICPORTLIST parameter isn't specified, Manager then looks for the next available port higher than the one on which it's running. One issue when using the DYNAMICPORTLIST parameter for Manager is that there may not be enough numbers specified or freely available for use. Frequently, orphan processes occupy the ports desired by Manager. You should either kill the zombie processes that occupy these ports or investigate ports that can be reserved for use with Oracle GoldenGate. My Oracle Support Note 966097.1 (http://support.oracle.com) has additional troubleshooting tips for network analysis with Oracle GoldenGate.
Network Data-Transfer Issues
Sometimes, network latency issues cause a lag in data transfer by the Extract from the source system to the target system. You can identify the latency issue by viewing the Extract checkpoint in the trail with the INFO EXTRACT, SHOWCH GGSCI command, as shown in this example:
INFO EXTRACT <group>, SHOWCH
GGSCI (oracledba) 13> info extract eataa,showch
EXTRACT EATAA Initialized 2011-03-16 00:50 Status STOPPED
Checkpoint Lag 00:00:00 (updated 00:17:15 ago)
Log Read Checkpoint Oracle Redo Logs
2011-03-16 00:50:47 Seqno 0, RBA 0
Current Checkpoint Detail:
Read Checkpoint #1
Oracle Redo Log
Startup Checkpoint (starting position in the data source):
Sequence #: 0
RBA: 0
Timestamp: 2011-03-16 00:50:47.000000
Redo File:
Recovery Checkpoint (position of oldest unprocessed transaction in the data so
urce):
Sequence #: 0
RBA: 0
Timestamp: 2011-03-16 00:50:47.000000
Redo File:
Current Checkpoint (position of last record read in the data source):
Sequence #: 0
RBA: 0
Timestamp: 2011-03-16 00:50:47.000000
Redo File:
Write Checkpoint #1
GGS Log Trail
Current Checkpoint (current write position):
Sequence #: 0
RBA: 0
Timestamp: 2011-03-16 00:53:45.022000
Extract Trail: AA
Header:
Version = 2
Record Source = U
Type = 4
# Input Checkpoints = 1
# Output Checkpoints = 1
File Information:
Block Size = 2048
Max Blocks = 100
Record Length = 2048
Current Offset = 0
Configuration:
Data Source = 3
Transaction Integrity = 1
Task Type = 0
Status:
Start Time = 2011-03-16 00:50:47
Last Update Time = 2011-03-16 00:50:47
Stop Status = G
Last Result = 0
The statistic to look for is Write Checkpoint, an example of which is shown here:
Write Checkpoint #1
GGS Log Trail
Current Checkpoint (current write position):
Sequence #: 0
RBA: 0
Timestamp: 2011-03-16 00:53:45.022000
Extract Trail: AA
If the number of write checkpoints isn't increasing in Oracle GoldenGate, it means the Extract process is unable to send data across the network to the trail file. If you're using a data-pump Extract on the source system, then issuing the GGSCI command INFO EXTRACT for both Extract processes will show that the primary Extract's checkpoints are moving, because it's able to write to the local trail; however, the data-pump checkpoints aren't incremented, thus showing the network latency issue. When you encounter this latency issue, contact and work with your local system administrator or network operations team to remediate the network slowness. Further analysis can be performed by issue the INFO EXTRACT and INFO REPLICAT commands from within the GGSCI interface, with the DETAIL option:
INFO EXTRACT <group>, DETAIL
INFO REPLICAT <group>, DETAIL
Given that the Replicat process groups are up and running without any system issues in terms of access to reading from and writing to the trail files, the next step is to compare the Extract write point with the Replicat read point by reviewing the following statistics output from the GGSCI INFO EXTRACT command:
Remote Trail Name Seqno RBA Max MB
c:\ggs_src\dirdat\aa 1 251 50
Compare this to the report from the GGSCI command INFO REPLICAT:
Log Read Checkpoint File C:\ggs_trgt\DIRDAT\AA000001
2011-03-13 22:41:58.000000 RBA 2142225
Whenever you see that the Extract checkpoints aren't being incremented, INFO REPLICAT may show continued processing by indicating that the read relative byte address (RBA) is increasing up until the last record has been written to the trail file. It then stops moving because no more records are sent across the network and written out to the trail file. Meanwhile, back on the source system, the data-pump Extract process is unable to move data to the target and soon fails in an abend condition as it runs out of memory for processing data. However, the primary Extract process group continues to run because it writes out to the local trail file. Eventually, it reaches the last trail file in the series; and when it can't checkpoint, it fails and abends.
This proves the point that it's very important to resolve network issues as soon as possible to prevent the Extract process from falling too far behind in the transaction logs, because you don't want data to be out of sync. You should monitor the network between the source and target environments by using a network diagnostic tool such as HP OpenView or Tivoli as well as communicating these issues to your network administrator to verify that the network is in a healthy condition and latency issues have been minimized as much as possible. After you've verified that network performance is sufficient, check to see if the Extract process bandwidth is saturated. You may need to split the processing between multiple Extract groups to remediate network performance issues. Multiple data pumps may also benefit environments without large bandwidth resources. This reduces the chance of an Extract failing if the network is unreliable and experiences periodic spikes or failures. You should group together tables with referential integrity to one another into the same Extract process group. You can also tune memory usage by the Extract in the event that the network latency issue has been addressed.
Oracle Database Issues with GoldenGate
In an ideal world, this book would cover the database configuration issues for each platform that supports Oracle GoldenGate. Because the details are beyond the scope of this book, I encourage you to consult the platform-related Oracle GoldenGate documentation for third-party RDBMS platforms online at http://download.oracle.com/docs/cd/E18101_1/index.htm. Oracle database-related issues that cause errors with the Oracle GoldenGate operations include the following:
Extract process can't read or access the online redo logs or Oracle database archive log files
Missing Oracle database archive logs
Out of sync Oracle database issues
Extract and Replicat failures with Oracle source and target database environments
Data pump errors
Extract Can't Access Oracle Database Archive and Redo Logs
In the event that the Extract process can't find the requested archive or online redo log file on the source Oracle database, it waits. For instance, you may encounter error messages from within Oracle GoldenGate such as this:
2011-03-16 00:51:55 ERROR OGG-00446 Error 5 (Access is denied.) opening log file
C:\WINORA11G2\ORADATA\WIN11SRC\REDO01.LOG for sequence 16. Not able to establish initial
position for begin time 2011-03-16 00:50:47.
2011-03-16 00:51:55 ERROR OGG-01668 PROCESS ABENDING.
You can find out the log file that the Extract is looking for by issuing the GGSCI VIEW REPORT <group> command. Your best bet to resolve missing archive logs with the Extract is to restore the missing archive log files and then restart the Extract group. To avoid these errors, you should maintain sufficient archive logs available to the Extract until the data contained in these logs has been processed.
A second issue with Extracts and Oracle database archive log files occurs whenever the archive logs aren't stored in the default Oracle location. In this case, you can use the Extract parameter ALTARCHIVELOGDEST <path name> to use another location for the archive logs that the Extract needs for processing. One key way you can identify a missing archive log issue is that when this occurs, the Extract startup process runs slowly and appears to be in a hang condition as it looks back for the archive log sequences to find the specific logs that it requires. The Extract must search through all past logs if it was restarted while a database transaction was opened, thus creating the requirement to search and go back to past operations until a commit was received because only committed transactions are recorded by Oracle GoldenGate. In-flight transactions aren't captured and also aren't recorded in the Oracle database archive logs. It can take hours or even days for the Extract to search through the Oracle database archive logs to locate the requested log file. If the Extract fails to locate the archive log, it abends and fails.
You should run a query against the V$TRANSACTION dynamic performance view on the source Oracle database to ensure that no open transactions are present. You should also run the GGSCI SEND EXTRACT command on a regular basis to view and manage long-running transactions. One way to set a threshold for long-running Oracle database transactions is to use the parameter WARNLOGTRANS to identify long-running transactions in Oracle GoldenGate. In the event that your Oracle database online redo logs have dumped their current data to the archive logs and you can't restore the necessary archive logs, you must resynchronize both the source and target database environments.
Extract Failure Conditions Due to Oracle Source Database Issues
As mentioned earlier, Extracts must have read access to log in to the source Oracle database to read data from the online redo log files. Check the permissions for the Extract user account provided in the parameter file if you experience errors during operation. You can verify this by using the DBLOGIN command to check for permission and read-access issues. Another root cause of this issue is the disk-full condition for the source Oracle database file system.
Once you've resolved the read and login issues on the source Oracle database, you need to stop both the Manager and Extract processes with Oracle GoldenGate. Exit the GGSCI interface for Oracle GoldenGate, and then restart the Manager and Extract process groups on the source system.
Data-Pump Errors
Recall from earlier chapters that you can use a special Extract process group called a data pump on an intermediate system to enhance performance and availability with processing. One common problem occurs when you use the PASSTHRU parameter and try to use data transformations. This causes the data pump to fail because the PASSTHRU parameter isn't supported with Oracle GoldenGate. For the data pump pass-through mode with Oracle GoldenGate, both source and target table names and structures must be the same, and no filtering operations can be performed on the data.
Another thing to check for in the data-pump parameter file when you use the PASSTHRU parameter is whether you're using the USERID or SOURCEDB parameter. If the source system doesn't contain a database, these parameters shouldn't be used. If you're using pass-through for some tables and normal processing for other tables, then for normal processing of these tables, the system must have a database. In addition, you need to specify the database login parameters with Oracle GoldenGate as well as use a source definitions file if any filtering is performed. Furthermore, you need to use a target definitions file for any column mapping or conversion operations.
Replicat Errors on the Oracle Database Target System
The Oracle Replicat performs the critical task of reading data from the remote trail files on the target database system. It uses SQL statements to apply the transactions to the target database system. The following issues can occur with Replicat processing and cause failures in Oracle GoldenGate:
The Replicat hangs or fails to process data against the target Oracle database.
The Replicat abends and fails on the target database system.
Large transactions cause the Replicat to fail.
Let's look into each of these issues with Replicats and how to resolve them.
Replicat Hangs on the Target System
Sometimes the Replicat process stalls and doesn't continue to process data from the remote trail file on the target system. If this issue occurs, check the status of the Replicat process group by issuing the GGSCI command INFO REPLICAT and noting the trail file name as well as the checkpoint information.
Make sure you verify that the Replicat is reading data from the same trail file the Extract process group is writing to. If you discover that this isn't the case, then you need to rectify the situation by executing the GGSCI command ALTER REPLICAT <group>, EXTTRAIL <trail file name>. Then, to verify that everything is correct, issue the GGSCI command INFO RMTTRAIL * on the target system. In the event that the trail file wasn't created, you can create a new trail file by executing the GGSCI command ADD EXTTRAIL <trail file name>, EXTRACT <group> on the source system.
Replicat Experiences an Abend Failure on the Target System
Recall from earlier chapters that Replicat processes live on the target database system and use a special database table called the checkpoint table. If this table is accidentally deleted or corrupted, Replicat processing fails. One indication of this is the abend condition shown here:
2011-03-16 01:13:58 ERROR OGG-00446 Checkpoint table GGS.CHECKPOINT does not exist.
Please create the table or recreate the REPORA group using the correct table.
2011-03-16 01:13:58 ERROR OGG-01668 PROCESS ABENDING.
You may also see errors on the source system with the Extract such as GGS error 516, "Extract read, No data found selecting position from checkpoint table."
The solution to corrupted checkpoint errors is to drop and re-create the checkpoint table on the target system for the Replicat or to use the convchk utility and then restart the Replicat. My Oracle Support Note 965703.1 (http://support.oracle.com) has tips on using the convchk utility.
The syntax for the convchk utility is
convchk <replicat group> <schema>.<checkpoint table name>
Let's look at how to use the convchk utility to repair a corrupted checkpoint table for Oracle GoldenGate with a Replicat on Windows. You run this from a Windows or UNIX/Linux shell prompt window and not from within the GGSCI interface:
C:\ggs_trgt>convchk repora ggs.checkpoint
***********************************************************************
Oracle GoldenGate Checkpoint Conversion Utility
Version 11.1.1.0.0 Build 078
Windows (optimized) on Jul 28 2010 17:43:24
Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.
Starting at 2011-03-16 01:23:18
***********************************************************************
Operating System Version:
Microsoft Windows Vista Business Edition, on x86
Version 6.0 (Build 6002: Service Pack 2)
Process id: 4880
Opening existing checkpoint file C:\ggs_trgt\dirchk\REPORA.cpr for group REPORA.
Checkpoint C:\ggs_trgt\dirchk\REPORA.cpr backed up to C:\ggs_trgt\dirchk\REPORA.cpr.1300252998.bak.
Updating checkpoint table in 8.0.2-compatible checkpoint file.
Successfully converted checkpoint to 8.0.2-compatible format.
Checkpoint conversion successful for group REPORA.
Once you've fixed the checkpoint table for the Replicat, you should be able to start the Replicat without errors as shown here:
GGSCI (oracledba) 14> info repora, detail
REPLICAT REPORA Last Started 2011-03-16 01:29 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:07 ago)
Log Read Checkpoint File c:\ggs_src\trails\aa000000
First Record RBA 0
Extract Source Begin End
c:\ggs_src\trails\aa000000 * Initialized * First Record
c:\ggs_src\trails\aa000000 * Initialized * First Record
Current directory C:\ggs_trgt
Report file C:\ggs_trgt\dirrpt\REPORA.rpt
Parameter file C:\ggs_trgt\dirprm\REPORA.prm
Checkpoint file C:\ggs_trgt\dirchk\REPORA.cpr
Checkpoint table GGS.CHECKPOINT
Process file C:\ggs_trgt\dirpcs\REPORA.pcr
Error log C:\ggs_trgt\ggserr.log
NOTE
The convchk utility causes the checkpoint file to become the new master for Replicat checkpoints. The Replicat resynchronizes the truncated checkpoint table with data from the checkpoint file.
Replicat Fails on Large Transactions
If the target system can't support large transactions, the Replicat process groups will fail with an out-of-memory condition. To solve this error, you can set the MAXTRANSOPS parameter to split up the Replicat transactions into smaller transactions or use multiple Replicat groups to process these in parallel.
Incompatible Record Errors with a Replicat
One common problem during Replicat processing on the target system occurs when incompatible records bring the Replicat to its knees and failure. In the GGSCI logs, you see an error similar to the following:
2011-03-09 11:00:22 GGS ERROR 509 GoldenGate Delivery for Oracle, REPORA.prm: Extract read, Incompatible record in C:\GGS_TRGT\DIRDAT\AA000000, rba 5466 (getting header).
This occurs if the data gets corrupted in transit from the source system to the target system. Another cause is a data format that isn't compatible with Replicat processing. If you use the FORMATASCII, FORMATSQL, or FORMATXML parameter for the Extract during the initial load, it will cause the failure to occur.
Incompatible-record errors with the Replicat also occur due to misconfiguration of trail files that are sent from the source to the target system when the Extract is configured to overwrite an existing trail file. For instance, if you have two Extract processes writing to the same trail file, they will overwrite the one the Replicat is looking to use, thus causing an incompatible-record error.
The Replicat needs complete records to apply the data successfully from the trail file to the target system. Sometimes the Extract and Replicat groups are dropped and rebuilt with the exact same trail-file name. This causes the Extract process to begin writing again at the first part of the same trail file, causing issues with stepping on the trail file.
You can use the Logdump utility to investigate error 509 conditions with Oracle and GoldenGate. In the event that you can't salvage the data, the best solution is to resynchronize the target database with a new initial load and to add new trail files correctly to avoid future issues.
Data-Synchronization Issues
As much as you might like Oracle GoldenGate to automate the data-verification process, it doesn't perform these checks. However, you can implement conflict-resolution scripts and programs to check for data synchronization between source and target systems. Oracle GoldenGate performs basic checks by matching target rows from the source and target database primary key and unique-key columns, or substitute key columns if you define these with the KEYCOLS option for the TABLE and MAP statements in the Extract and Replicat.
You should perform out-of-sync data checks on a regular basis as part of your database administration duties for Oracle GoldenGate to remediate synchronization issues. Veridata provides a suite of tools to check for data-synchronization issues with Oracle GoldenGate environments.
What causes data-synchronization issues, you may ask? There are several key factors:
Tables missing key constraints
Character-set configuration issues
Column-missing errors
Fetch failures on source and target tables
Tables Missing Key Constraints
Oracle GoldenGate relies on table-key constraints to map source-to-target DDL and DML activities. Without these keys, Oracle GoldenGate uses all of the table columns as a key value. This may cause multiple rows of the table to be updated if they contain identical data. One way to address this issue to avoid data duplication is to set the KEYCOLS option for the TABLE and MAP statements in the Extract and Replicat parameter files along with the LIMITROWS option to prevent multiple rows from being updated.
Character Set Configuration Issues
Because Oracle GoldenGate uses the database character set to perform data-synchronization tasks, you need to make sure the Oracle source and target character sets are the same to avoid synchronization errors. If they're different, then errors will result. You can check the character sets by looking at the NLS_LANG environment variable on both the source and target Oracle database systems. The source NLS_LANG value must be set to the same exact character set as the target system. In addition, the target Oracle database character set needs to be set as a superset of the source Oracle database system.
Missing-Column Errors
If columns are found to be missing between source and target, Oracle GoldenGate won't replicate data correctly. This has its root cause in two areas. The first case arises if supplemental data hasn't been enabled on the source and target Oracle database environments correctly. For Oracle, you use the ADD TRANDATA command to add the required supplemental log data either at the database level or on a table-by-table basis.
Another cause of missing-column errors occurs when the KEYCOLS parameter is used to perform column-mapping transformations from the source to target environments. If there is a mismatch in source-to-target key-column mapping, errors occur. You need to make sure you include either an existing key or a unique column in the KEYCOLS definition to resolve this error.
Filters and functions used in Extract and Replicat parameter files also cause these errors in the event that the correct existing key column isn't specified and found between the source and target mapping transformations given in the Extract and Replicat parameter files.
Fetch Failures
Database fetch failures cause data-synchronization errors if source-row data is deleted during Oracle GoldenGate processing. Another root cause of fetch failures occurs when undo retention for the Oracle source and target environments expires. This causes the mandatory read-consistent image of the data that the Extract process is looking for to disappear, with the end result that the Oracle Snapshot Too Old message is displayed. You can resolve this issue by adding the FETCHOPTIONS NOUSESNAPSHOT parameter to the Extract process group configuration. This parameter makes the Extract fetch data from the table instead of going to the undo tablespace.
After you've verified that Oracle GoldenGate has processed the record, you need to remove FETCHOPTIONS so the Extract returns to fetching data from the undo tablespace. You should also increase the amount of time that Oracle maintains data in the undo tablespace on the source and target Oracle databases.
Nice blog with detailed explanation. Very Helpful. Good work!!!
ReplyDeleteThank you so much for sharing this beautiful GG articles. It was very helpful for my interview preparation.
ReplyDelete