Monday 15 January 2018

Orac;le GoldenGate Troubleshooting and Interview


GoldenGate Performance Tuning 

Tips & Techniques 



Agenda

What is Lag and what can contribute to lag in a GoldenGate replication environment

Compare Classic Extracts and Replicats with Integrated Extracts and Replicats

New performance tuning challenges introduced by the Log Mining Server component 

What tools do we have available in OGG 12.2 to monitor performance 

Using those tools to examine and investigate a real-life performance problem and how the problem was resolved


Is the problem because of a Goldengate component?

Extract in reading the archive log and writing the data to a trail (or remote host)
Datapump reading the extract trail and writing to a remote host 
Network 
Collector (server.exe) on the target receiving network data and writing it to a remote trail

Replicat reading the remote trail and writing to the database

Logmining Server issues – both source as well as target 


Measuring OGG Performance 

Typically a GoldenGate performance problem is centered around Lag

LAG is the elapsed time between when a transaction is committed and written to a storage medium such as an archive log or redo log on the source and the time when Replicat writes the same transaction to the target database


Do we still use Classic Extracts and Replicats?

Any reason why we are not using BOTH
Integrated Extracts 
Integrated Replicats
 Do we have source/target Oracle databases on versions less than 11.2.0.3  or 11.2.0.4?
Consider Downstream Capture if Integrated Extracts not allowed on the source Do Do we want RMAN integration with Oracle GoldenGate?

A case for Integrated Replicat

Integrated Replicat offers automatic parallelism which automatically increases or decreases the number of apply processes based on the current workload and database performance
Co-ordinated replicat  provides multiple threads, but dependent objects had to be handled by the same replicat thread – otherwise Replicat will abend
Integrated Replicat  ensures referential integrity and DDL/DML  operations are automatically applied in the correct order

Management and tuning of Replicat performance is simplified since you do not have to manually configure multiple Replicat processes to distribute the tables between them.

Tests have shown that a single Integrated Replicat can out-perform multiple Classic Replicats as well as multi-thread Co-ordinated Replicat


Tune the database before tuning GoldenGate!

Is the target database already having I/O issues?

Are the redo logs properly configured – size and location?

Data replication is I/O intensive, so fast disks are important, particularly for the online redo logs.

Redo logs are constantly being written to by the database as well as being read by GoldenGate Extract processes

Do we have any significant ‘Log File Sync’ wait events?

Also consider the effect of adding supplemental logging which will increase the redo logging

Key Points

Identify and isolate tables with significantly high DML activity 

Separate Extract and Replicat process groups for such tables

Dedicated Extract and Replicat process groups for tables with LOB columns 

Possibly dedicated process groups for tables with long running transactions 

Run the Oracle GoldenGate database Schema Profile check script to identify tables with missing PKs/UKs/Deferred Constraints/NOLOGGING/Compression

Start with a single Replicat process (as well as Extract process)

Add replicat processes  until latency is acceptable (Classic)

In its classic mode, Replicat process can be a source of performance bottlenecks because it is a single-threaded process that applies operations one at a time by using regular SQL

Consider BATCHSQL to increase performance of Replicat particularly in OLTP type environments characterized by smaller row changes in terms of data

BATCHSQL causes Replicat to organize similar SQL statements into arrays which leads to faster processing as opposed to serial apply of SQL statements

If tables can be separated based on PK/FK relationships consider Co-Ordinated replicats with multiple threads

For Integrated Replicats check the parameters PARALLELISM, MAX_PARALLELISM, COMMIT_SERIALIZATION, EAGER_SIZE  

 

Tune the Network for OGG

The network is an important component in GoldenGate replication 

The two RMTHOSTparameters, TCPBUFSIZE and TCPFLUSHBYTES are very useful for increasing the buffer sizes and network packets sent by Data Pump over the network from the source to the target system. 

This is especially beneficial for high latency networks

Use Data Pump compression if network bandwidth is constrained and when CPU headroom is available 



Tuning the Network - Before

GGSCI (ti-p1-bscs-db-01) 1> send pbsprd2 gettcpstats

Sending GETTCPSTATS request to EXTRACT PBSPRD2 ...

RMTTRAIL ./dirdat/rt000113, RBA  38351713

Buffer Size 2266875     Flush Size 2266875     SND Size 2097152
Streaming Yes
Inbound Msgs         2710    Bytes        54259,        3 bytes/second
Outbound Msgs       20541    Bytes  13539482811,   795925 bytes/second
Recvs                5420
Sends               20541
Avg bytes per recv         10, per msg     20
Avg bytes per send      659144, per msg  659144
Recv Wait Time     1558113382, per msg       574949, per recv   287474
Send Wait Time     7514461569, per msg       365827, per send   365827

Tuning the Network - After

GGSCI (pl-p1-bscs-db-01) 12> send pbsprd1 gettcpstats

Sending GETTCPSTATS request to EXTRACT PBSPRD1 ...

RMTTRAIL ./dirdat/rt000000, RBA  98558417
Buffer Size 200000000   Flush Size 200000000   SND Size 134217728
Streaming Yes
Inbound Msgs          258    Bytes         4746,        1 bytes/second
Outbound Msgs        2402    Bytes     98675058,    37893 bytes/second
Recvs                 516
Sends                2402
Avg bytes per recv          9, per msg     18
Avg bytes per send      41080, per msg  41080
Recv Wait Time       63143512, per msg       244742, per recv   122371
Send Wait Time         486941, per msg          202, per send      202

Compare it with the earlier figures 

Recv Wait Time     1558113382, per msg       574949, per recv   287474
Send Wait Time     7514461569, per msg       365827, per send   365827


Allocate memory for the Log Mining Server 

Set the STREAMS_POOL_SIZE initialization parameter for the database
Set the MAX_SGA_SIZE parameter for both Integrated Extracts and Integrated Replicats
Controls amount of memory used by logmining server – default is 1 GB
STREAMS_POOL_SIZE= (MAX_SGA_SIZE * PARALLELISM) + 25% head room 

For example, using the default values for the MAX_SGA_SIZE and  PARALLELISM parameters: 
( 1GB * 2 ) * 1.25 = 2.50GB 
STREAMS_POOL_SIZE = 2560M 

Allocate memory for the Log Mining Server 

Log mining Server is running on both source as well as target
STREAMS_POOL_SIZE needs to be properly sized on IE as well as IR end

SQL>  SELECT state FROM GV$GG_APPLY_RECEIVER;
STATE
----------------------------------------------
Waiting for memory

SQL> show parameter streams
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
streams_pool_size                    big integer 2G

SQL>  alter system set streams_pool_size =24G sid='bsprd1' scope=both;
System altered.

SQL> SQL> SELECT state FROM GV$GG_APPLY_RECEIVER;
STATE
----------------------------------------------
Enqueueing LCRS


Troubleshooting with Case Study


GGSCI information commands
• The ggserr.log file (known as the error log)
• Process reports (dirrpt/*.rpt)
• The discard file (dirrpt/*.dsc)
• The Event Viewer on Windows systems or the syslog on UNIX systems to view errors at the operating-system level
Handling TCP/IP Errors
The tcperrs file (TCP Errors), which is located in the Oracle GoldenGate installation directory of the target system, can help with troubleshooting TCP/IP errors like  Bottlenecks , Connection refused error ,Not enough bandwidth

Troubleshooting with Case Study

Process Report

Each Extract, Replicat, and Manager has its own report file that shows:
A banner with startup time
Parameters that are in use
Table and column mapping
Database and environmental information
Runtime messages and errors
The Process report provides initial clues, such as:
Invalid or out-of-order parameters
Data mapping errors 
You can:
View Process reports using View Report <group> in GGSCI
The default location is the dirrpt directory in the Oracle GoldenGate home location.

Troubleshooting with Case Study

GGSCI (SOURCE) > View Report ext_4a

Source Context :
SourceModule : [er.init]
SourceID : [/scratch/aime1/adestore/views/aime1_adc4150256/o ggcore/OpenSys/src/app/er/init.cpp]
SourceFunction : [init_functions]
SourceLine : [3985]
ThreadBacktrace : [7] elements
[/u01/app/oracle/product/ogg_src/libgglog.so(CMessageContext::AddThreadContext ()+0x1e) [0x7f6d61a1670e]]
[/u01/app/oracle/product/ogg_src/libgglog.so(CMessageFactory::CreateMessage(CS 
ourceContext*, unsigned int, ...)+0x2cc) [0x7f6d61a0f6ac]]
:
[/u01/app/oracle/product/ogg_src/libgglog.so(_MSG_ERR_STARTUP_PARAMERROR_ERROR 
TEXT(CSourceContext*, char const*, CMessageFactory:: MessageDisposition)+0x31) [0x7f6d61a02b8d]]
: [/u01/app/oracle/product/ogg_src/extract(init_functions(int, char**)+0x880) [0x54b5f0]]
: [/u01/app/oracle/product/ogg_src/extract(main+0xbf) [0x57359f]]
: [/lib64/libc.so.6( libc_start_main+0xfd) [0x3fe861ec9d]]
: [/u01/app/oracle/product/ogg_src/extract( gxx_personality_v0+0x38a) [0x4edaba]]

2014-06-07 20:22:42 ERROR OGG-00303 Group name [ext4a] in param file does not match PROCESS ID [EXT_4A].

2014-06-07 20:22:42 ERROR OGG-01668 PROCESS ABENDING.

GGSCI (EDRSR42P1) >

Event (Error) Log

Oracle GoldenGate Event Log lists show:

A history of GGSCI commands
Processes that started and stopped
Errors that occurred
Informational messages
The Event Log also shows events leading to an error.
 For example, you might discover one of the following:
-Someone stopped a process.
-A process failed to make a TCP/IP or database connection.
-A process could not open a file.
Use one of the following to view the Event Log:
-A standard text editor or a shell command
-The GGSCI command View GGSEVT
 
The log’s name is ggserr.log and it is located in the root Oracle GoldenGate directory. You can also locate the file by using the Info Extract <group>, Detail command. The location of the ggserr.logfile is listed with the other Oracle GoldenGate working directories:

GGSCI> Info Extract oraext, Detail

EXTRACT ORAEXT Last Started 2014-03-28 10:45 Status STOPPED
Checkpoint Lag 00:00:00 (updated 161:55:17 ago)
Log Read Checkpoint    File u01/app/oracle/oradata/orcl/REDO03.LOG
2014-03-29 17:55:57 Seqno 34, RBA 104843776
<some contents deliberately omitted>
Current directory  /home/oracle/OGG_T/

Report file    /home/oracle/OGG_T/dirrpt\ORAEXT.rpt 
Parameter file   /home/oracle/OGG_T/dirprm\ORAEXT.prm 
Checkpoint file   /home/oracle/OGG_T/dirchk\ORAEXT.cpe 
Process file      / home/oracle/OGG_T/dirpcs\ORAEXT.pce
Error log              /home/oracle/OGG_T/ggserr.log

Filtering the Log for Errors

Because the file can become quite large, you may want to parse the file for errors or commands only, as in the following example:

$ more ggserr.log | grep ERROR

The preceding filter shows only errors:
Shell>cat ggserr.log | grep ERROR
2014-03-13 10:48:28 GGS ERROR 112 GoldenGate Capture for Oracle, EXT_CTG1.prm:
Could not create /home/oracle/OGG_T/dirdat/g1000000 (error 2, No such file or directory).
Other ways to read and filter this data:

Discard File

A discard file:

Is created if the DiscardFile <file_name> parameter is used in the Extract or Replicat parameter file
Is created when Extract or Replicat has a problem with the record that it is processing
Contains column-level details for operations that the process cannot handle
Is usually used for Replicat to log operations that cannot be reconstructed or applied
Can help you resolve data-mapping issues

Logdump utility

The logdump utility enables you to:
Display or search for information that is stored in Oracle GoldenGate trails or extract files
Save a portion of an Oracle GoldenGate trail to a separate trail file
To start logdumpfrom the Oracle GoldenGate installation directory:
[OS prompt]./logdump
To access help:
[OS prompt] Logdump 1>help

Opening a Trail
Logdump>open dirdat/rt000000000
Current LogTrail is /ggs/dirdat/rt000000000
To view the trail file header:
Logdump 1>fileheader on
To view the record header with data:
Logdump 2>ghdr on
To add column information:
Logdump 3>detail on
To add hex and ASCII data values to the column list:
Logdump 4>detail data
To control how much record data is displayed:
Logdump 5>reclen 280


Logdump utility

Viewing Trail Records
To go to the first record and to move from one record to another in sequence:
Logdump 6 > pos 0
Logdump 7 > next
 
Or just type n.
 
To position at an approximate starting point and locate the next good header record:
Logdump 8 > pos <approximate RBA>
Logdump 9 > scanforheader
 
Or just type sfh


Startup Problems: Overview


Is the database running?
Is Manager running?
Does the group exist?
Use Info All to view all processes and groups on the system.
Is the group name spelled wrong?
Does a trail or extract file exist?
If not, Extract cannot write its initial checkpoint, and Replicat has no data    source to read.
Use the Detailoption with Info Extract <group> or with Info Replicat <group> to see if a trail or file exists.

Is the parameter file missing?
Check for the file in the Oracle GoldenGate dirprm/
subdirectory.
The file must have the same name as the group.
Check the case of the name. If you copied the file from Windows to UNIX, the case might be wrong; it should be all lowercase.
If the file is not there, did you store it somewhere else?
To store a parameter file elsewhere (not recommended), use the PARAMSargument with Add Extract or Alter Extract.
Is the parameter file inaccessible?
Verify that the process user has the permissions to open the file.


Startup Problems: Overview

Startup Problem:

When Extract or Replicat starts, it queries the database for table attributes and builds an object record.
 When Oracle GoldenGate analyzes a large volume of tables, it takes a long time and seems to be stalled.

Use WildcardResolve

If wildcards are being used for table names, you can use the WildcardResolveparameter with the Dynamicoption. 
Source tables that satisfy wildcard definitions are resolved each time the wildcard rule is satisfied (instead of all together at startup).

Best Practice
You should use DynamicResolutionwhen Extract is configured for a large number of tables.

Common Issues
•The Table and Map parameters must end with a semicolon.
•In the syntax, a comma must be followed by a space.
•When using the ColMap parameter, be sure to include all commas, quotation marks, and parentheses.
•If "userid" and "pswd" are enclosed in double quotation marks, they might be case sensitive in some databases


What should you do if a process fails?
View the process report:
View Report <group>
You can also use Oracle GoldenGate Director to view the report.
The report shows basic troubleshooting information:
Parameters that are used
Tables that are mapped
Key-column information
Specific details about the error



Extract Is Slow

Is Extract waiting for an archive log?
Can Extract locate the archive logs?
Is the slowness due to low volume?

•Is Extract waiting for an archive log?

-If Extract is restarted when there is a long-running transaction, it may need to read many old logs that have already been processed.
•Can Extract locate the archive logs?

-If the archives are not in the default Oracle Database location, use the AltArchiveLogDest <path_name> Extract parameter to specify the alternative location.
•Is the slowness due to low volume?

-If there are only 100 transactions being generated per second, the maximum speed is 100 transactions per second.
For Oracle tables, query the v$transactiontable to see if there are open transactions


Extract can be slow when there are numerous large transactions and when tables contain many LOBs.
If necessary, use the CacheMgrparameter (with the assistance of Oracle Technical Support) to control the virtual memory and temporary disk space that are available for caching uncommitted transaction data.
To view the CacheMgrsettings, issue a View Report command for the Extract.

When to Adjust CacheMgr
The memory manager generates statistics that can be viewed with the Send Extract command when used with the CacheMgr option. The statistics show the size of the memory pool, the paging frequency, the size of the transactions, and other information that creates a system profile.
Based on this profile, you might need to make adjustments to the memory cache if you see performance problems that appear to be related to file caching. The first step is to modify the CacheSize and CachePageOutSize parameters. You might need to use a higher or lower cache size, a larger or smaller page size, or a combination of both, based on the size and type of transactions that are being generated. You might also need to adjust the initial memory allocation with the CacheBufferSize option.
Syntax
CacheMgr {
[, CacheSize <size>]
[, CacheBufferSize <size>]
[, CacheDirectory <path> [<size>] [, ...]] [, CachePageOutSize <size>]
}


Extract Abends 

Is Extract unable to open the transaction logs?
Check to see if the disk is full. If it is not full, the Extract user may not have read permissions on the files.
After permissions are granted, perform the following steps:
1. Stop Manager.
2. Exit GGSCI.
3. Close the terminal session.
4. Start the processes again from a new session.
Is an archive log missing?
Restore the archive log that is specified in the error message in the report file (and subsequent archive logs)


Using Archive Logs
Extract reads the archive logs when an online log is not available. If Extract is shut down for any length of time, the records that it needs on restart were probably archived away. On startup, Extract searches for the missing data. When the log cannot be found, Extract abends. Extract should never be stopped intentionally―at least not for long. If it is stopped either intentionally or otherwise, make certain that archives are available all the way back to the time of failure. If the length of time during which Extract was down is exceptionally long, you might find it more practical to resynchronize the tables rather than wait for Extract to go through the archives and catch up.
If the naming of the archive logs is changed, Oracle GoldenGate may not be able to find them because Oracle GoldenGate simply increments the number and looks for the next archive.
It is possible to prevent missing archive log errors by enabling an Extract group in classic capture mode to work with Oracle Recovery Manager (RMAN) to retain the logs that Extract needs for recovery.
Consider the following example:
GGSCI> Register Extract <group> LogRetention
This example creates an underlying Oracle Streams capture process that is dedicated to the Extract group and is used only for log retention. The Streams capture process has a similar name.


Replicat Abends

Is Replicat unable to locate a trail?
Issue Info Extract <group> and note the trail name.
Make sure that the trail was created:
1. Issue Info RmtTrail * (or Info ExtTrail * if local).
2. Create the trail (if necessary).
Issue Alter Replicat to read that trail.

Is Replicat unable to open a trail file?

Verify that the Replicat user has read and write permissions on trail files

You can confirm a trail’s existence by looking in the dirdat directory of the Oracle GoldenGate home location.
 You should see the same information that is shown by Info Extract and Info RmtTrail,  and the same information that is specified with RmtTrail when you view the Extract parameter file




Top 50 Oracle Goldengate Interview Questions To Get the Job!

What type of Topology does Oracle Goldengate support?

GoldenGate supports the following topologies.
  • Unidirectional
  • Bidirectional
  • Peer-to-peer
  • Broadcast
  • Consolidation
  • Cascasding

What are the main components of the Oracle Goldengate replication?

The replication configuration consists of the following processes.
  • Manager
  • Extract
  • Pump
  • Replicate

What database does Oracle GoldenGate support for replication?

  • Oracle Database
  • TimesTen
  • MySQL
  • IBM DB2
  • Microsoft SQL Server
  • Informix
  • Teradata
  • Sybase
  • Enscribe
  • SQL/MX

What transaction types does Oracle Goldengate support for Replication?

Goldengate supports both DML and DDL Replication from the source to target.

What are the supplemental logging pre-requisites?

The following supplemental logging is required.
  • Database supplemental logging
  • Object level logging

Why is Supplemental logging required for Replication?

When a transaction is committed on the source database, only new data is written to the Redo log. However for Oracle to apply these transactions on the destination database, the before image key values are required to identify the effected rows. This data is also placed in the trail file and used to identify the rows on the destination, using the key value the transactions are executed against them.

List important considerations for bi-directional replication?

The customer should consider the following points in an active-active replication environment.
  • Primary Key: Helps to identify conflicts and Resolve them.
  • Sequences: Are not supported. The work around is use to use odd/even, range or concatenate sequences.
  • Triggers: These should be disabled or suppressed to avoid using uniqueness issue
  • Data Looping: This can easy avoided using OGG itself
  • LAG: This should be minimized. If a customer says that there will not be any LAG due to network or huge load, then we don’t need to deploy CRDs. But this is not the case always as there would be some LAG and these can cause Conflicts.
  • CDR (Conflict Detection & Resolution): OGG has built in CDRs for all kind of DMLs that can be used to detect and resolve them.
  • Packaged Application: These are not supported as it may contain data types which are not support by OGG or it might not allow the application modification to work with OGG.

Are OGG binaries supported on ASM Cluster File System (ACFS)?

Yes, you can install and configure OGG on ACFS.

Are OGG binaries supported on the Database File System (DBFS)? What files can be stored in DBFS?

No, OGG binaries are not supported on DBFS. You can however store parameter files, data files (trail files), and checkpoint files on DBFS.

What is the default location of the GLOBALS file?

A GLOBALS file is located under Oracle GoldenGate installation directory (OGG HOME)

Where can filtering of data for a column be configured?

Filtering of the columns of a table can be set at the Extract, Pump or Replicat level.

Is it a requirement to configure a PUMP extract process in OGG replication?

A PUMP extract is an option, but it is highly recommended to use this to safe guard against network failures. Normally it is configured when you are setting up OGG replication across the network.

What are the differences between the Classic and integrated Capture?

Classic Capture:
  • The Classic Capture mode is the traditional Extract process that accesses the database redo logs (optionally archive logs) to capture the DML changes occurring on the objects specified in the parameter files.
  • At the OS level, the GoldenGate user must be a part of the same database group which owns the database redo logs.
  • This capture mode is available for other RDBMS as well.
  • There are some data types that are not supported in Classic Capture mode.
  • Classic capture can’t read data from the compressed tables/tablespaces.
Integrated Capture (IC):
  • In the Integrated Capture mode, GoldenGate works directly with the database log mining server to receive the data changes in the form of logical change records (LCRs).
  • IC mode does not require any special setup for the databases using ASM, transparent data encryption, or Oracle RAC.
  • This feature is only available for oracle databases in Version 11.2.0.3 or higher.
  •  It also supports various object types which were previously not supported by Classic Capture.
  • This Capture mode supports extracting data from source databases using compression.
  • Integrated Capture can be configured in an online or downstream mode.

List the minimum parameters that can be used to create the extract process?

The following are the minimium required parameters which must be defined in the extract parameter file.
  • EXTRACT NAME
  • USERID
  • EXTTRAIL
  • TABLE

What are macros?

Macro is an easier way to build your parameter file. Once a macro is written it can be called from different parameter files. Common parameters like username/password and other parameters can be included in these macros. A macro can either be another parameter file or a library.

Where can macros be invoked?

The macros can be called from the following parameter files.
  • Manager
  • Extract
  • Replicat
  • Gobals

How is a macro defined?

A macro statement consists of the following.
  • Name of the Macro
  • Parameter list
  • Macro body
Sample:
MACRO #macro_name
PARAMS (#param1, #param2, …)
BEGIN
< macro_body >
END;

I want to configure multiple extracts to write to the same exttrail file? Is this possible?

Only one Extract process can write to one exttrail at a time. So you can’t configure multiple extracts to write to the same exttrail.

What type of Encryption is supported in Oracle Goldengate?

Oracle Goldengate provides 3 types of Encryption.
  • Data Encryption using Blow fish.
  • Password Encryption.
  • Network Encryption.

What are the different password encrytion options available with OGG?

You can encrypt a password in OGG using
  • Blowfish algorithm and
  • Advance Encryption Standard (AES) algorithm

What are the different encryption levels in AES?

You can encrypt the password/data using the AES in three different keys
a) 128 bit
b) 192 bit and
c) 256 bit

Is there a way to check the syntax of the commands in the parameter file without actually running the GoldenGate process

Yes, you can place the SHOWSYNTAX parameter in the parameter file and try starting. If there is any error you will see it.

How can you increase the maximum size of the read operation into the buffer that holds the results of the reads from the transaction log?

If you are using the Classical Extract you may use the TRANSLOGOPTION ASMBUFSIZE parameter to control the read size for ASM Databases.

What information can you expect when there us data in the discard file?

When data is discarded, the discard file can contain:
1. Discard row details
2. Database Errors
3. Trail file number

What command can be used to switch writing the trail data to a new trail file?

You can use the following command to write the trail data to a new trail file.
SEND EXTRACT ext_name, ROLLOVER

How can you determine if the parameters for a process was recently changed

When ever a process is started, the parameters in the .prm file for the process is written to the process REPORT. You can look at the older process reports to view the parameters which were used to start up the process. By comparing the older and the current reports you can identify the changes in the parameters

No comments:

Post a Comment