Sunday, 12 April 2020

sql slowness due to bad execution plan in Oracle Database And Execution Plan Interpretation


In an Oracle Database, an execution plan is the sequence of operations the database performs to retrieve data for a SQL statement. Identifying and fixing a "bad" plan involves comparing estimated resource costs against actual execution performance

Execution Plan Overview
Generated by the cost-based optimizer, an execution plan serves as the "driving directions" for a query
  • Structure: Plans are tree-shaped, where data flows from leaf nodes (tables) up to the root (the final SELECT statement).
  • Key Components:
    • Operation Nodes: Specific actions like TABLE ACCESS FULL, INDEX RANGE SCAN, or HASH JOIN.
    • Cost: An internal estimate of CPU, I/O, and network resources required; lower is typically better.
    • Cardinality: The estimated number of rows processed at each step.
    • Predicate Information: Shows how WHERE clause filters or join conditions are applied.
  •  Key Components to Analyze
    • Operations: The action performed (e.g., Table Scan, Index Seek, Hash Match, Sort).
    • Access Paths: How the database reads data.
      • Index Seek is usually fast, targeting specific rows using a key.
      • Table Scan is often slow on large tables because it reads every single row..
    • Row Counts: Compares estimated rows vs. actual rows. A large discrepancy indicates outdated statistics, prompting the optimizer to pick a poor plan

    2. How to Read the Flow
    • Tree Structure: The plan is hierarchical. Data flows from the bottom of the tree up to the root (usually a SELECT statement).
    • Right-to-Left (Graphical): If viewing a graphical plan, data processing generally starts at the bottom rightmost operator and moves to the left.
    • Parent/Child Nodes: Data is passed from child operators (e.g., fetching rows) to parent operators (e.g., joining or filtering the rows)

    3. Common Performance Bottlenecks
    • Missing Indexes: Identified by heavy table scans or warnings in the execution plan.
    • Heavy Sorts: Operations that require \(O(N \log N)\) sorting can eat up CPU and memory.
    • Unsuitable Joins: Large datasets using Nested Loops instead of Hash Joins can lead to excessive looping and slow performance

    Generation Tools:
    • EXPLAIN PLAN FOR: Predicts the plan without executing the query.
    • DBMS_XPLAN.DISPLAY_CURSOR: Shows the actual plan used for a recently executed statement still in the shared pool
    Identifying a "Bad" Execution Plan
    A plan is considered "bad" if it uses suboptimal access paths or join methods, often leading to performance degradation
  • Misestimation: Large gaps between estimated rows and actual rows suggest the optimizer made a wrong turn.
  • Inconsistent Performance: A query that was fast but suddenly becomes slow often indicates a "plan flip" where the plan_hash_value changed.
  • Suboptimal Operations: Forcing a FULL TABLE SCAN on a massive table instead of using an existing index
  • How to Fix a Bad Execution Plan
    Fixing a bad plan requires addressing the root cause or forcing the optimizer to use a known good plan
  • Refresh Statistics: Outdated or stale statistics are the most common cause of bad plans.
    • Use the DBMS_STATS package: EXEC DBMS_STATS.GATHER_TABLE_STATS('OWNER', 'TABLE_NAME');.
  • Fix Underlying Schema Issues:
    • Add missing indexes or rebuild invalid ones.
    • Create histograms for columns with skewed data distribution to help the optimizer estimate selectivity.
  • Enforce a Good Plan (Stability Tools):
    • SQL Plan Baselines: Capture a known good plan from the cursor cache and force the optimizer to use it, even if stats change.
    • SQL Profiles: Use the SQL Tuning Advisor to generate a profile that corrects the optimizer’s miscalculations.
    Use SQL Hints (Temporary Fix):
    • Inject hints like /*+ INDEX(table index) */ or /*+ LEADING(table) */ to manually direct the optimizer. Note: These should be used sparingly in production


    To fix a bad execution plan in an Oracle Database using SQL Plan Management (SPM) or COE SQL Profiles, follow these steps to stabilize the query with a known "good" plan. 

    1. Fix using SQL Plan Management (SPM)
    SPM is the most reliable way to force a specific plan because it is permanent and harder for the optimizer to ignore

  • Identify the Good Plan: Find the PLAN_HASH_VALUE of a previously well-performing execution from the cursor cache or AWR.
  • Load the Good Plan: Use DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE (for current memory) or LOAD_PLANS_FROM_AWR (for historical plans) to create a baseline.
  • Fix the Plan: Mark the baseline as "FIXED" to ensure the optimizer only considers that specific plan.

  • sql
    var plans_loaded number;
    exec :plans_loaded := DBMS_SPM.ALTER_SQL_PLAN_BASELINE( -
      sql_handle => 'SQL_handle_here', -
      plan_name  => 'PLAN_name_here', -
      attribute_name => 'fixed', -
      attribute_value => 'YES');


    Disable Bad Plans: If multiple plans exist in the baseline for that SQL ID, disable the ones performing poorly using DBMS_SPM.ALTER_SQL_PLAN_BASELINE with attribute_name => 'enabled' and value => 'NO'

    2. Fix using COE SQL Profiles (SQLT Tool)
    COE (Center of Expertise) profiles use the coe_xfr_sql_profile.sql script to extract a plan's "outlines" (hints) and apply them to a SQL ID.

  • Generate the Script: Run coe_xfr_sql_profile.sql (part of the Oracle SQLT tool) and provide the SQL_ID and the PLAN_HASH_VALUE of the "good" plan.
  • Apply the Profile: The script generates a new SQL file (e.g., coe_sqlid_plan.sql). Execute this script in SQL*Plus to create the profile.
  • Verification: Check DBA_SQL_PROFILES to confirm it is ENABLED.
  • Note: SQL Profiles provide guidance rather than absolute enforcement. If data distribution changes significantly, the optimizer may still bypass a profile
  • Note: run the below script to create the sql profile and enforce it
    1)  check if optimizer is using bad execution plan 
    if execution plan has changed and optimizer is using bad execution then run below steps
    2) copy the below file on production server
    coe_xfr_sql_profile 
    @coe_xfr_sql_profile.sql <sqlid> <Best Plan Hash Value from SQLT report> 
    2) run script by passing sql and  best plan hash value
     START coe_xfr_sql_profile.sql 98u11n3k345w2 1074671822;
    3. Immediate "Emergency" Fix
    If you need an instant resolution while setting up SPM or COE:
    • Purge the Bad Plan: Use DBMS_SHARED_POOL.PURGE to remove the specific bad cursor from the shared pool, forcing a re-parse.
    • Check Statistics: Ensure table and index statistics are not stale, as this is the most common cause of plan regressions



    In order to investigate SQL slowness, we need to check session lock, long running ,bottleneck and SQL plan change


    Login on Database and check blocking session with below commands

    1) To check session lock on database

    SELECT DECODE(request,0,'Holder: ','Waiter: ') ||
    sid sess, id1, id2, lmode, request, type
    FROM V$LOCK
    WHERE (id1, id2, type) IN (SELECT id1, id2, type FROM V$LOCK WHERE request > 0)
    ORDER BY id1, request;


    2) To identify session as well as SQL details

    select sid,serial#,osuser,username,program,status,SQl_id from v$session where sid=&sid;

    or

    select s.username, s.sid, s.serial#,t.sql_text "Last SQL"from v$session s, v$sqlarea t where s.sql_address =t.address ands.sql_hash_value =t.hash_value and s.sid = '&sid';


    3) Map OS processes with database process as below

    select p.spid,s.sql_id,s.sql_hash_value,s.sql_id,s.sid,s.serial#,s.username,s.event,to_char(s.logon_time,'dd-mon:hh24:mi') from v$session s, v$process p
    where p.addr= s.paddr and s.sid  = '&sid';


    4) Session can be killed from OS prompt as well as database level

    From OS level

    $ kill -9 <spid>


    From Database level

    alter system kill session 'sid,serial#' immediate;
    eg

    alter system kill session '41,5' immediate;


    5) Cross check session at database level.

    select sid,serial#,osuser,username,program,status,SQl_id from v$session where sid=&sid;
     

    6) Basic Performance Check

    select count(1),event from v$session group by event;

    select count(1),status,username from v$session group by status,username;

    select count(1),status,module from v$session group by status,module order by module desc

    select count(1),status,module,sql_id from v$session group by status,module,sql_id order by module desc

    7) Long running query

    select sid, serial#, context, sofar, totalwork,round(sofar/totalwork*100,2) "%_complete" from v$session_longops where sofar <> totalwork;


    select inst_id,sql_hash_value,sql_id, sid,serial#,to_char(start_time,'dd-mon:hh24:mi') start_time,
                 opname,target,totalwork,sofar,(elapsed_Seconds/60) elamin,username,
                 time_remaining tre
        from gv$session_longops
        where totalwork <> SOFAR
        order by 7,tre, start_time,sid

    or
    $ cat status.sql
    SELECT SID, SERIAL#, CONTEXT, SOFAR, TOTALWORK, ROUND(SOFAR/TOTALWORK*100,2) "%_COMPLETE" FROM gV$SESSION_LONGOPS WHERE OPNAME LIKE 'RMAN%' AND OPNAME NOT LIKE '%aggregate%' AND TOTALWORK != 0 AND SOFAR != TOTALWORK order by 6 desc
    /
    $

     8) Temp usage / Rollback segment/PGA usage:

    We can get information of temporary tablespace usage details with the help of below query:
    Set lines 1000
    SELECT b.tablespace,
    ROUND(((b.blocks*p.value)/1024/1024),2)||'M' "SIZE",
    a.sid||','||a.serial# SID_SERIAL,
    a.username,
    a.program
    FROM sys.v_$session a,
    sys.v_$sort_usage b,
    sys.v_$parameter p
    WHERE p.name = 'db_block_size'
    AND a.saddr = b.session_addr
    ORDER BY b.tablespace, b.blocks;


    select st.sid "SID", sn.name "TYPE",
    ceil(st.value / 1024 / 1024/1024) "GB"
    from v$sesstat st, v$statname sn
    where st.statistic# = sn.statistic#
    and sid in
    (select sid from v$session where username like UPPER('&user'))
    and upper(sn.name) like '%PGA%'
    order by st.sid, st.value desc;


    9)Generate AWR,ASH and ADDM Report for further analysis

    @ ?/rdbms/admin/awrrpt.sql
    @ ?/rdbms/admin/awrrpti.sql
    @ ?/rdbms/admin/ashrpt.sql
    @ ?/rdbms/admin/addmrpt.sql
     @?/rdbms/admin/awrddrpt.sql--compare how the database performed during two different periods.

    10) check execution plan and execution plan change and optimizer is bad execution plan

     select * from table ( dbms_xplan.display_cursor('&sql_id'))

      select * from table ( dbms_xplan.display_awr('&sql_id'))

    To check child number for sql_id
    -------------------------------
    SQL_ID and child number of the SQL are known
    A SQL_ID can have multiple children with different characteristics. You can identify the child number by selecting from V$SQL as follows:

    select sql_id, child_number, sql_text from v$sql where sql_id='&sql';

    or

    SELECT sql_id, hash_value, child_number, SUBSTR(sql_text,1,40) Text
    FROM v$sql
    WHERE sql_text LIKE '%&Some_Identifiable_String%'
    /

    The first child for a cursor will have a CHILD_NUMBER of zero.
    Once you have this information, you can use dbms_xplan.display_cursor as follows:

    set linesize 150
    set pagesize 2000
    select * from TABLE(dbms_xplan.display_cursor('&SQL_ID', &CHILD));


    check execution plan using DBA_HIST_SQLSTAT 
    -------------
    col begin_interval_time for a30
    break on plan_hash_value on startup_time skip 1
    select ss.snap_id, ss.instance_number node, begin_interval_time, s.sql_id, s.plan_hash_value,
    nvl(executions_delta,0) execs,
    (elapsed_time_delta/decode(nvl(executions_delta,0),0,1,executions_delta))/1000000 avg_etime,
    (buffer_gets_delta/decode(nvl(buffer_gets_delta,0),0,1,executions_delta)) avg_lio, version_count 
    from DBA_HIST_SQLSTAT S, DBA_HIST_SNAPSHOT SS 
    where s.sql_id = nvl('&sql_id','4dqs2k5tynk61')
    and ss.snap_id = S.snap_id 
    and ss.instance_number = S.instance_number
    and executions_delta > 0 --and plan_hash_value=3099046451
    order by 1, 2, 3




    HASH /SQL plan needed found in the Cursor Cache

    Now you know which hash plan hash to be fixed. Now follow the below example. If the needed plan is found in the cursor cache then it is very simple to create a baseline and fixing the plan for the SQL query.

    Ex: Determined the Hash Plan: 2601263939 is the best to run against the SQL ID: 47qjdv3ncanhr


    1.Create the Baseline:

    var v_num number;
    exec :v_num:=dbms_spm.load_plans_from_cursor_cache(sql_id =>'47qjdv3ncanhr',plan_hash_value => 2601263939);

    OR

    Example from Internet for script:

    SQL> !cat create_baseline.sql
    var ret number
    exec :ret := dbms_spm.load_plans_from_cursor_cache(sql_id=>'&sql_id', plan_hash_value=>&plan_hash_value);
     SQL> @create_baseline
    Enter value for sql_id: 47qjdv3ncanhr
    Enter value for plan_hash_value: 2601263939

    2. Verify the baseline got created or not
    =================================

    SQL> select sql_handle, plan_name, enabled, accepted, fixed from dba_sql_plan_baselines;
    SQL_HANDLE                      PLAN_NAME                                              ENA    ACC     FIX
    ------------------------------             ------------------------------                                     ---         ---         ---
    SQL_4bd90f15ef3c1f10           SQL_PLAN_4rq8g2rrms7sh3cc6a555       YES     YES     NO


    To see all the details, this will create a file with all the baseline info.:

    spool baseline_plan.txt
    select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'SQL_4bd90f15ef3c1f10',  format=>'basic'));
    verify the spool file to confime the SQL ID and the HASH associated with it.

    3.TO MODIFY A SQL PLAN BASELINE

    var v_num number;
    exec :v_num:=dbms_spm.ALTER_SQL_PLAN_BASELINE (sql_handle =>'SQL_4bd90f15ef3c1f10',plan_name => 'SQL_PLAN_4rq8g2rrms7sh3cc6a555', attribute_name=> 'FIXED',  attribute_value  => 'YES');
    Attributes
        ·  enabled (YES/NO) : If YES, the plan is available for the optimizer if it is also marked as accepted.
        ·  fixed (YES/NO) : If YES, the SQL plan baseline will not evolve over time. Fixed plans are used in preference to non-fixed plans.
        ·  autopurge (YES/NO) : If YES, the SQL plan baseline is purged automatically if it is not used for a period of time.
    ·   plan_name : Used to amend the SQL plan name, up to a maximum of 30 character.
    ·    description : Used to amend the SQL plan description, up to a maximum of 30 character.
       

    Sometimes the required HASH / SQL plan  will not be present in the Cursor Cache, then you have to  load it from a AWR snapshots.

     Steps are as below:
    ======================================

    To load plans to the cursor cache from awr snapshots:

    1. -- Drop SQL Tuning Set (STS)

    BEGIN
      DBMS_SQLTUNE.DROP_SQLSET(
        sqlset_name => 'SAMPLE_TUNING_SET');
    END;

     2. -- Create SQL Tuning Set (STS)

     BEGIN
      DBMS_SQLTUNE.CREATE_SQLSET(
        sqlset_name => 'SAMPLE_TUNING_SET',
        description => 'SQL Tuning Set for loading plan into SQL Plan Baseline');
    END;

    3.-- Populate STS from AWR using a time duration when the desired plan was used.

    Retrieve the begin Snap ID from the same session described in the GRID Contol above or by :   

    SELECT SNAP_ID, BEGIN_INTERVAL_TIME, END_INTERVAL_TIME FROM dba_hist_snapshot ORDER BY END_INTERVAL_TIME DESC;

    Note: Specify the sql_id in the basic_filter (other predicates are available, see desc dba_hist_snapshot) if necessary.

    DECLARE
      cur sys_refcursor;
    BEGIN
      OPEN cur FOR
        SELECT VALUE(P)
        FROM TABLE(
           dbms_sqltune.select_workload_repository(begin_snap=>1477, end_snap=>1478,basic_filter=>'sql_id = ''9n82zq1gkpg2t''',attribute_list=>'ALL')
                  ) p;
         DBMS_SQLTUNE.LOAD_SQLSET( sqlset_name=> 'SAMPLE_TUNING_SET', populate_cursor=>cur);
      CLOSE cur;
    END;
    /
    4. -- List out SQL Tuning Set contents to check we got what we wanted

    SELECT
      first_load_time,
      executions as execs,
      parsing_schema_name,
      elapsed_time  / 1000000 as elapsed_time_secs,
      cpu_time / 1000000 as cpu_time_secs,
      buffer_gets,
      disk_reads,
      direct_writes,
      rows_processed,
      fetches,
      optimizer_cost,
      sql_plan,
      plan_hash_value,
      sql_id,
      sql_text
       FROM TABLE(DBMS_SQLTUNE.SELECT_SQLSET(sqlset_name => 'SAMPLE_TUNING_SET'));

    5.-- Finally create the baseline from the STS:

    DECLARE
    my_plans pls_integer;
    BEGIN
      my_plans := DBMS_SPM.LOAD_PLANS_FROM_SQLSET(
        sqlset_name => 'SAMPLE_TUNING_SET',
        basic_filter=>'plan_hash_value = ''1117073691'''
        );
    END;
    /

    6.-- Verify the baseline got created and modify it if necessary

    select sql_handle, plan_name, enabled, accepted, fixed from dba_sql_plan_baselines;
    exec :v_num:=dbms_spm.ALTER_SQL_PLAN_BASELINE (sql_handle =>'SQL_ab2ab5c194ee0fc8',plan_name => 'SQL_PLAN_aqapps6afw3y81722054c', attribute_name=> 'FIXED',  attribute_value  => 'YES');

    7.-- Verify all details for the new Baseline:

    spool baseline_plan.txt
    select * from table(
        dbms_xplan.display_sql_plan_baseline(
            sql_handle=>'SQL_ab2ab5c194ee0fc8',
            format=>'basic'));


    using below query to see which all plan_hash_value are being used by my sql_id...

    select ss.snap_id,
    ss.instance_number node,
    begin_interval_time,
    sql_id,
    plan_hash_value,
    nvl(executions_delta,0) execs,
    round((elapsed_time_delta/decode(nvl(executions_delta,0),0,1,executions_delta))/1000000,2) avg_etime,
    round((buffer_gets_delta/decode(nvl(buffer_gets_delta,0),0,1,executions_delta)),2) avg_lio
    from DBA_HIST_SQLSTAT S,
    DBA_HIST_SNAPSHOT SS
    where sql_id = 'xxxxxxxx'
    and ss.snap_id = S.snap_id
    and ss.instance_number = S.instance_number
    and executions_delta > 0
    order by 1 desc, 2, 3;

    USE CASE
    -------------------
    Some times It is observed that optimizer choose bad execution plan which lead to the poor performance for particular SQL ID When it has been narrowed down to a SQL statement.
    It  has been noticed that the execution plan has been changed and a bad execution plan was being used If optimizer picks Bad plan, then to address that we need to use features like SPM/COE Profile. 

    Analysis:

    1) SQL execution plan for sql_id= 4dqs2k5tynk61 has been changed causing slow response to Biztalk.
    2) There was no changes done in the queries at application level/DB level.
    3) Its Oracle internal behavior(CBO),SQL plan may change due to some factors like
    :
    Change in volume of data (high modifications like inserts/updates/deletes )
    Change number of rows processed from total number of rows(selectivity)
    Changes in bind values
    Inaccurate statistics: cardinality, lack of good statistics
    lack of optimizer statistics
    Recent changes like upgrade, migration and patching
    Any parameter level change like optimizer version, optimizer parameters, schema/metadata definitions, system settings, as well as SQL profile creation.

    Detail Analysis

     

    To check execution plan from the cursor cache using below query

    -------------------------------------------------------------

    select * from table ( dbms_xplan.display_cursor('&sql_id'))

      

    To check execution plan from the AWR using below query

    -------------------------------------------------------

    select * from table ( dbms_xplan.display_awr('&sql_id'))

     

    To check child number for sql_id

    -------------------------------

    select sql_id, child_number, sql_text from v$sql where sql_id='&sql';

      

    To check number of execution

    -----------------------------

    select sql_id,plan_hash_value,executions from gv$sql where sql_id='&SQL_ID'


    To find out change in SQL plan for SQL_ID

    The below script will help you find the SQL plan ID for previous executions and check if there is any change in SQL plan ID.

     

    To check plan change using DBA_HIST_SQLSTAT S

    ----------------------------------------------------------

    col begin_interval_time for a30

    break on plan_hash_value on startup_time skip 1

    select ss.snap_id, ss.instance_number node, begin_interval_time, s.sql_id, s.plan_hash_value,

    nvl(executions_delta,0) execs,

    (elapsed_time_delta/decode(nvl(executions_delta,0),0,1,executions_delta))/1000000 avg_etime,

    (buffer_gets_delta/decode(nvl(buffer_gets_delta,0),0,1,executions_delta)) avg_lio, version_count

    from DBA_HIST_SQLSTAT S, DBA_HIST_SNAPSHOT SS

    where s.sql_id = nvl('&sql_id','4dqs2k5tynk61')

    and ss.snap_id = S.snap_id

    and ss.instance_number = S.instance_number

    and executions_delta > 0 --and plan_hash_value=3099046451

    order by 1, 2, 3

     

    As per Analysis for SQL id 4t7u0qszwns9p, It is observed that there are three different execution plan for this ,

    currently optimizer is using execution plan 2959802443 which is bad execution plan ,

    Here We have two different good execution plan as well ,

     

    During analysis , we loaded execution plan 1672047590 to improve performance of database.




    No comments:

    Post a Comment