Disclaimer

Monday, 21 February 2022

Efficiency-based SQL tuning

 

Introduction

In order to tune a query, you need to know two things:
– can it be tuned, and if yes, then by how much
– which part of the query (which operation, or which data object) is most promising from the tuning point of view.

Currently existing tuning methods don’t really answer these questions. You either focus on the most expensive operation(s), and hope that you can eliminate them (or transform them into something less cosly), or you focus on the ones where you see a large discrepancy between actual rowcounts and optimizer predictions (cardinality feedback tuning). Either way, you can’t be sure that you’ve set your priorities right. It could well be the case that the cost of the most expensive operation cannot be reduced by much, but you can win back enough performance elsewhere. With the cardinality feedback tuning, you also don’t have any guarantee that improving accuracy of optimizer estimates would eventually transform into acceptable level of performance.

Of course, if the plan only contains a few operations, this is not a big issue, and after a few trials you will usually get to the bottom of the problem. However, when dealing with very complex plans, hundreds operations long, this is not really an option. When dealing with such plans a few months back, I developed for myself a simple tuning method that allows to evaluate with high accuracy potential tuning benefits of plan operations, using rowsource stats and optimizer as input. In this post, I’m sharing this method, as well as a script that implements it.

Basic principle

The method is based on two very simple facts:

1) it costs I/O to retrieve rows

2) therefore rejecting rows means wasting I/O,

i.e. whenever you’re losing rows, you’re losing cost spent on acquiring them, and by tracking operations that lose most cost, you find the perfromance bottleneck. Note that term “reject” here is used in a broad sense, i.e. it includes not only rows that rejected by the filter predicate or by a standalone FILTER operation, but also rows discarded because less rows were needed than the data block contained.

For example, if we spent 100 I/O operations retrieving 10,000 rows, and only half of them satisfy the filter predicate, then we could save up to 0.5 x 100 = 50 I/O operations  if we could access the data more efficiently.

By extrapolating this logic to a more general case, we can estimate potential cost savings for each operations as:

(recoverable cost) = (cost) * [1-(outgoing rows)/(incoming rows)],

where incoming rows is the number of rows accessed by the operation (before filtering), and the number of outgoing rows is also known as the cardinality of the operation.  For operations that receive rows from child rowsources (FILTER, VIEW, TABLE ACCESS BY ROWID) the number of incoming rows is simply the number of outgoing rows of the highest child operation. For index and table scans, it’s the number of total rows read; Oracle doesn’t provide this information in rowsource statistics, but it can be estimated indirectly. For example, for a full table scan or index fast full scan table/index in absence of partitioning (or if all partitions are read), it’s the total number of rows. In other cases, it can be estimated indirectly by multiplying the row density (average number of rows in a block) by the number of blocks read (gets):

(estimated accessed rows) = (gets)*(total rows)/(total blocks).

where for indexes, we take only leaf blocks. Using this, we can calculate the recoverable cost for each operation as the difference between the actual and avoidable costs:

(recoverable cost) = (gets)*[1 – (cardinality)*(total blocks)/(gets)/(total rows)]

Of course, this is just an estimate, but I found this estimate to be very accurate in the vast majority of practical cases.

Finding and resolving performance bottlenecks

I’ve written a simple script (see appendix) that calculates recoverable cost per each operation, and sorts the plan operations by this quantity. This way, one can immediately evaluate potential tuning benefits of the query, and which operations are most promising from the tuning point of view.

The exact way of resolving an inefficiency depends on the specifics, and is different in each individual case, but there are some general guidelines. All inefficiency sources can be divided into to broad groups:

1) repetition-related (i.e. the driving operation is causing this operation to be executed more times than necessary)
2) filtering-related (i.e. because of lack of a more efficient access path, we read more rows than we actually need, and thus waste I/O on accessing data that is not returned to the user).

The signature of repetition-related issues is high number of starts, while filtering-related issues are recognized by strong-selectivity filter predicates. Of course, it is also possible to have both issues in the same operation.

Issues from the first group can be resolved by changing the method and/or order of join, by allowing subquery unnesting to avoid executing the subquery per each row of the master query, and other similar methods. Very often, such issues are caused by wrong cardinality estimates, and providing more complete information to the optimizer can suffice for resolving the problem.

Issues from the second group typically arise when the optimizer lacks an appropriate access method to retrieve just as many rows as necessary. In such cases the problem can be resolved by either creating a new index or changing an existing one, modifying the query to avoid implicit conversions that would make it possible for the query to use an existing index, adjusting partitioning strategy for tables involved in the query, etc.

In the script I’ve added some logic to determine the most likely cause of the problem.

We’ll consider some specific examples in the next section.

Examples

Here are a few examples to illustrate the method and the script. I used Oracle 11.2.0.1 on Windows 7 (Home) to set up and run the examples. For each test case, I give the script to re-create it, the query, the dbms_xplan output and then my script output.

Case 1. A simple full table scan.

In this example, I build a table with 100,000 rows (no indexes), gather stats and select 10,000 rows:

create table tc1$a(id number, padding varchar2(200)) pctfree 90;
 
insert into tc1$a
select level, rpad(0, 100, 0)
from dual
connect by level <= 1e5;
 
commit;
 
exec dbms_stats.gather_table_stats(user, 'tc1$a');
 
select * from tc1$a where id <= 10000;
----------------------------------------------------------------------------------------------
| Id  | Operation         | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |      1 |        |  10000 |00:00:00.90 |   15015 |  14433 |
|*  1 |  TABLE ACCESS FULL| TC1$A |      1 |  10000 |  10000 |00:00:00.90 |   15015 |  14433 |
----------------------------------------------------------------------------------------------
 
  ID OPERATION                                POTENTIAL_SAVINGS                ORIGIN_OF_INEFFICIENCY         COMMENTS                        STARTS
---- ---------------------------------------- ------------------------------   ------------------------------ ------------------------------ -------
   1 TABLE ACCESS FULL SCOTT.TC1$A            13495 (90% operation, 90% query) rows rejected by filter predic                                      1
                                                                               ate after a full table scan

Since we are throwing away 90% of the rows that we accessed, the operation is 90% inefficient, and 90% of the 15015 gets is avoidable (hence potential savings = 13495). If we partitioned the table in such a way that we could read all rows we need (and none rows we don’t need) in a single partition access, then we’d save exactly that many gets. Of course in practice, inefficient full table scans cannot always be addressed by [re]partitioning the table, and using index access instead is preferable. Exact savings when switching from a full table access to an index scan vary a lot depending on the number and length of columns in the table and in the index, whether a table access is required at all or the query can be answered without it, and other factors, so it cannot be calculated exactly. But the order of magnitude in most cases will be similar to the value calculated by the script.

Case 2. Scanning a local index without partition pruning.

create table tc3$a(id number, x number, padding varchar2(200))
partition by range(id)
interval (1000)
(
  partition p1 values less than (1000)
)
pctfree 90;
 
insert into tc3$a
with gen as (select level id from dual connect by level<=1e3)
select rownum, round(10000*dbms_random.value), rpad(0, 100, 0)
from gen g1,
        gen g2
where rownum <= 1e6;
 
commit;
 
create index i$tc3$a on tc3$a(id) local;
 
create index i$tc3$x on tc3$a(x) local;
 
exec dbms_stats.gather_table_stats(user, 'tc3$a');
 
exec dbms_stats.gather_index_stats(user, 'i$tc3$x');
 
SQL_ID  8bf0y7vyuk0p1, child number 0
-------------------------------------
select * from tc3$a a where a.x <= 2
 
Plan hash value: 3126104984
 
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |         |      1 |        |    249 |00:00:00.05 |     123 |     43 |
|   1 |  PARTITION RANGE ALL               |         |      1 |    300 |    249 |00:00:00.05 |     123 |     43 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| TC3$A   |     21 |    300 |    249 |00:00:00.05 |     123 |     43 |
|*  3 |    INDEX RANGE SCAN                | I$TC3$X |     21 |    300 |    249 |00:00:00.01 |      59 |      0 |
-----------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("A"."X"<=2)
 
  ID OPERATION                                POTENTIAL_SAVINGS              ORIGIN_OF_INEFFICIENCY         COMMENTS                        STARTS
---- ---------------------------------------- ------------------------------ ------------------------------ ------------------------------ -------
   1 PARTITION RANGE ALL                      UNKNOWN                                                                                            1
   3 INDEX RANGE SCAN SCOTT.I$TC3$X           58 (98% operation, 47% query)  navigating multiple partitions caution: actual blocks differ       21
                                                                              of a local index (PARTITION R from estimated blocks by 15 %
                                                                             ANGE ALL)                      from estimated blocks by 15 %
 
   2 TABLE ACCESS BY LOCAL INDEX ROWID SCOTT. 0 (0% operation, 0% query)                                                                        21
     TC3$A



Case 3. Subquery that cannot be unnested

create table tc7$a
as
with gen as (select level id from dual connect by level<= 1e5)
select level id, rpad(0,100,0) padding
from gen g1, gen g2
where rownum<=1e7;
 
create table tc7$b
as
select level id, rpad(0,100,0) padding
from dual connect by level<=1e5; create index i$tc7$a on tc7$a(id); create index i$tc7$b on tc7$b(id); exec dbms_stats.gather_table_stats(user, 'TC7$A'); exec dbms_stats.gather_table_stats(user, 'TC7$B'); select * from tc7$a a where exists (select 1 from tc7$b b where  id = a.id and rownum>0);
 
--------------------------------------------------------------------------------------------------
| Id  | Operation           | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |         |      1 |        |    100K|00:00:00.47 |     109K|    219 |
|*  1 |  FILTER             |         |      1 |        |    100K|00:00:00.47 |     109K|    219 |
|   2 |   TABLE ACCESS FULL | TC7$A   |      1 |    100K|    100K|00:00:00.07 |    8211 |      0 |
|   3 |   COUNT             |         |    100K|        |    100K|00:00:00.31 |     101K|    219 |
|*  4 |    FILTER           |         |    100K|        |    100K|00:00:00.28 |     101K|    219 |
|*  5 |     INDEX RANGE SCAN| I$TC7$B |    100K|      1 |    100K|00:00:00.23 |     101K|    219 |
--------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter( IS NOT NULL)
   4 - filter(ROWNUM>0)
   5 - access("ID"=:B1)
 
  ID OPERATION                                POTENTIAL_SAVINGS              ORIGIN_OF_INEFFICIENCY         COMMENTS                        STARTS
---- ---------------------------------------- ------------------------------ ------------------------------ ------------------------------ -------
   5 INDEX RANGE SCAN SCOTT.I$TC7$B           101109 (100% operation, 92% qu inefficient driving operation  caution: actual blocks differ   100000
                                              ery)                           (join, subquery etc.) and/or B from estimated blocks by 15 %
                                                                             -tree navigation overhead
 
   2 TABLE ACCESS FULL SCOTT.TC7$A            6638 (81% operation, 6% query) unknown                                                             1
   3 COUNT                                    0 (0% operation, 0% query)                                                                    100000
   4 FILTER                                   0 (0% operation, 0% query)                                                                    100000
   1 FILTER                                   0 (0% operation, 0% query)                                                                         1

In this case most of the cost is coming from the index range scan at step 5, and the reason for inefficiency is optimizer’s inability to unnest the correlated subquery because of the use of the rownum pseudocolumn in it. If the redundant rownum>0 condition is removed, then the optimizer transforms the correlated subquery into a much more efficient join.

Limitations, known issues and plans for further development

The script is still in development, so bugs are possible. In its current form, the script is not suited for DW databases (it doesn’t cover parallelism, bitmap indexes, materialized views etc.). However, this is not the limitation of the method itself, and with some development effort it should be relatively straightforward to extrapolate the script accordingly.

The method relies on accuracy of statistics to evaluate row density, so if the row density has changed significantly since the last time stats were gathered, the script will produce wrong output. I’ve added some checks to my script to draw user’s attention if stats on an object are missing, stale or user-defined. Additionally, the script cross-checks estimated number of blocks (DBA_TABLES/DBA_INDEXES) vs. actual (DBA_SEGMENTS). However, for indexes it compares leaf blocks with total blocks, which is incorrect. Also, for some reason the actual and estimated blocks seems to differ significantly for small objects (it probably has to do with the way extents are allocated, but I haven’t looked into this in detail). I also haven’t yet added any special treatment for index-organized tables.

I am planning to update the script and post the updated version as soon as I get a chance.

The method can also be applied to analyzing explain plans. The main problem is that explain plan doesn’t show how many times each operation is executed, but this information can be deduced from the cardinality of the driving operation which can be tricky in some cases. I’m planning to continue to work on this in the future as well.

It is also possible to take the method one step further by moving from actual row density to minumal theoretically possible row density derived from the block size and average row length (this would require carefully accounting for overhead due to reserved space in a block, space for block headers etc.), which would extend the list of problems that could be diagnosed by the method to include “inflated” tables (unreset high watermark) and similar problems.

Summary

The method presented in this article has several advantages over traditional tuning approaches and is especially useful for following cases:
– complex plans (50-100 operations and higher)
– fine-tuning (trying to squeeze more performance out of a reasonably well performing query)
– learning how to tune SQL (it show not only where, but also how, why and how much performance is lost in a query).

The method can diagnose a wide range of performance problems, including:
1) inefficient index access
2) inefficient full table scan
3) unnested subqueries
4) inefficient local partitioned indexes
5) unmergeable views.

The method can be extrapolated for explain plans.

Appendix: Script to identify performance bottlenecks in a query (and estimate potential tuning benefits)

The script needs query rowsource statistics to produce output, so make sure the query is run with STATISTICS_LEVEL = ALL or gather_plan_statistics hint before analyzing it with the script.

Script output:

id, operation — id and description of the plan operation
potential_savings — how much performance benefits can be expected from tuning this operation (e.g. by creating or enabling a better access path)
origin_of_inefficiency — an educated guess as to the origin of inefficiency based on how many times the operation was executed (STARTS), whether or not there are filter predicates for this operation, and some additional information about parent operations
starts — “starts” column from V$SQL_PLAN_STATISTICS

rem  (c) Nikolay Savvinov, 2013
rem  a script to identify performance bottleneck
rem  and estimate potential benefits from removing them
rem  using rowsource statistics (will only return meaningful results
rem  if the plan is still in the cache and if rowsource statistics were populated.
rem  takes sql_id and child_number as parameters
 
set verify off
set linesize 400
set pagesize 9999
 
column id format 999
column operation format A40
column potential_savings format A30
column origin_of_inefficiency format A30
column comments format A30
column starts format 999999
column filter_predicates format A30
column repetition_ratio format 999999
 
 
spool report.txt
 
    with myplan as                                                                                                                                                                                     
    (                                                                                                                                                                                                  
        select p.sql_id, id, parent_id, operation, options, object_owner, object_name, object_type, cardinality, depth,                                                                                       
                S.LAST_CR_BUFFER_GETS gets, S.last_starts STARTS, s.last_output_rows outrows, p.filter_predicates, p.access_predicates                                                                                                      
        from v$sql_plan p,                                                                                                                                                                             
                v$sql_plan_statistics s                                                                                                                                                                
        where p.sql_id = s.sql_id                                                                                                                                                                      
        and P.CHILD_NUMBER = S.CHILD_NUMBER                                                                                                                                                            
        and p.id = s.operation_id                                                                                                                                                                      
        and (p.sql_id, p.child_number) = (SELECT * FROM (SELECT SQL_ID, CHILD_NUMBER FROM V$SQL WHERE SQL_TEXT like '&&1' ORDER BY LAST_ACTIVE_TIME DESC) WHERE ROWNUM=1)
--        and p.sql_id = '&1'
--        and p.child_number = &2                                                                                                                                                                                                                                                                                      
     ),                                                                                                                                                                                                
    extended_plan as                                                                                                                                                                                   
    (                                                                                                                                                                                                  
        select mp.*,  
        (                                                                                                                                                                                              
            select max(id)                                                                                                                                                                             
            from myplan                                                                                                                                                                                
            connect by prior id = parent_id                                                                                                                                                            
            start with id = mp.id
            and 'TABLE ACCESS' = mp.operation
            and mp.options like 'BY%ROWID'                                                                                                                                                                      
        ) rowid_provider_id,                                                                                                                                                                             
        (                                                                                                                                                                                              
            select min(id)
            from myplan
            where parent_id = mp.id           
        ) older_child_id,                                                                                                                                                                             
        (select min(id) from myplan where parent_id = mp.id) older_child,
        (select count(*) from myplan where parent_id = mp.id) num_children
        from myplan mp                                                                                                                                                                                 
    ),
    opstats as
    (
      select owner, table_name, num_rows, blocks, t.stale_stats, t.user_stats
      from dba_tab_statistics t
      where table_name in (select object_name from myplan)
      and partition_name is null
      union
      select owner, index_name, num_rows, leaf_blocks blocks, I.STALE_STATS, I.USER_STATS
      from dba_ind_statistics i
      where index_name in (select object_name from myplan)
      and partition_name is null
    ),
    segstats as
    (
        select owner, segment_name, sum(blocks) actual_blocks
        from dba_segments s
        where segment_name in (select object_name from myplan)
        and rownum>0
        group by owner, segment_name
    ),   
    inrows as
(   
    select lpad(' ', ep.depth, ' ') || operation || ' ' || options || ' ' plan_output,
            (select min(id) from myplan where operation like 'PARTITION%' and id != ep.id connect by prior parent_id = id and num_children<=1 start with id = ep.id) partition_iterator_id,                                                                                                                                                                                                                                                                                                                         
    case when operation in ('FILTER', 'INDEX', 'VIEW', 'TABLE ACCESS', 'COUNT') then gets       
            when operation = 'HASH JOIN' then gets - (select sum(gets) from extended_plan ep2 where parent_id = ep.id)   
    end own_gets,
            case when operation = 'TABLE ACCESS' and options like 'BY%ROWID' then (select outrows from myplan where id = rowid_provider_id) 
            --        when operation  = 'TABLE ACCESS' and options = 'FULL' then ss.num_rows*ep.starts
                    when operation = 'TABLE ACCESS' and options = 'FULL' then ceil(num_rows*gets/os.blocks)
                    when (operation = 'INDEX' and options like '%SCAN' and options != 'FULL SCAN') then ceil(num_rows*gets/os.blocks)
                    else (select outrows from myplan where id = older_child_id) 
            end inrows,
            ep.*,
            ceil(gets/os.blocks) rpt_ratio,
            os.num_rows,
            os.blocks,
            ss.actual_blocks,
            os.user_stats,
            os.stale_stats
    from extended_plan ep,
            opstats os,
            segstats ss
    where ep.object_owner = os.owner (+)
    and ep.object_name = os.table_name (+)
    and ep.object_owner = ss.owner (+)
    and ep.object_name = ss.segment_name (+)
),
leakage as
(
    select inrows.*,
             case when nvl(inrows,0) <= 0 then null
                     when inrows<outrows then 0
                    else floor(own_gets*(inrows-outrows)/inrows)
             end cost_leakage,
             case when (object_type like 'INDEX%' or object_type like 'TABLE%') and num_rows is null then 'no stats!' end notes
    from inrows
),
summary as
(
    select l.*, 
             case
              when cost_leakage = 0 then null
              when operation = 'HASH JOIN' and filter_predicates is not null then 'filter predicates in a hash join'
              when operation = 'HASH JOIN' and filter_predicates is null then 'hash area too small or wrong order of joined rowsources'
              when operation = 'VIEW' and options is null and filter_predicates is not null then 'inability to push down a predicate into a view'
              when operation = 'VIEW' and options like '%PREDICATE%' and filter_predicates is not null then 'inability to merge a view'
              when operation = 'FILTER' and filter_predicates is not null and num_children>1 then 'inefficient filtering operation (probably a subquery could not be unnested)'
              when operation = 'FILTER' and filter_predicates is not null and num_children=1 then 'restrictive single-source filtering operation'
              when operation = 'INDEX' and starts = 1 and filter_predicates is not null then 'index filter predicates'
              when operation = 'INDEX' and starts > 1 and partition_iterator_id is null and filter_predicates is null then 'inefficient driving operation (join, subquery etc.) and/or B-tree navigation overhead' 
              when operation = 'INDEX' and starts > 1 and partition_iterator_id is not null  and filter_predicates is null and (select operation || ' ' || options from leakage where id = l.partition_iterator_id) not like '%SINGLE%' then 'navigating multiple partitions of a local index (' || (select operation || ' ' || options from leakage where id = l.partition_iterator_id) || ')'
              when operation = 'INDEX' and starts > 1 and partition_iterator_id is not null  and filter_predicates is null and (select operation || ' ' || options from leakage where id = l.partition_iterator_id)  like '%SINGLE%' then 'inefficient driving operation'
              when operation = 'INDEX' and starts > 1 and partition_iterator_id is not null  and filter_predicates is null then 'navigating multiple partitions of a local index (' || (select operation || ' ' || options from leakage where id = l.partition_iterator_id) || ')'
              when operation = 'INDEX' and starts > 1 and partition_iterator_id is not null  and filter_predicates is not null then 'inefficient index access predicates, navigating multiple partitions of a local index (' || (select operation || ' ' || options from leakage where id = l.partition_iterator_id) || ')'
              when operation = 'INDEX' and starts > 1 and filter_predicates is not null then 'inefficient join order and/or access predicates'
              when operation = 'TABLE ACCESS' and options = 'FULL' and starts = 1 and filter_predicates is not null then 'rows rejected by filter predicate after a full table scan'
              when operation = 'TABLE ACCESS' and options = 'FULL' and starts > 1 then 'multiple executions of a full table scan'
              when operation = 'TABLE ACCESS' and options like 'BY%ROWID' and starts > 1 and filter_predicates is null then 'inefficient driving operation (join, subquery etc.)' 
              when operation = 'TABLE ACCESS' and options like 'BY%ROWID' and starts>1 and filter_predicates is not null then 'inefficient driving operation (join, subquery etc.) and/or access predicates' 
              when operation = 'TABLE ACCESS' and options like 'BY%ROWID' and starts=1 and filter_predicates is not null then 'inefficient access predicates'
              WHEN COST_LEAKAGE>0 THEN 'unknown'
              end leakage_reason,            
              round(abs(100*(actual_blocks/blocks - 1))) stats_error,
             nullif(rpt_ratio, 1) repetition_ratio,
             max(gets) over () total_gets
    from leakage l
)
select id,
         operation || ' ' || options || ' ' || object_owner ||  decode(object_name, null, null, '.') || object_name operation,  --num_rows, gets, inrows, outrows,
        nvl2(cost_leakage,  cost_leakage || ' (' || round(100*cost_leakage/gets) || '% operation, ' || round(100*cost_leakage/total_gets) || '% query)', 'UNKNOWN') potential_savings,
    --        round(abs(100*(actual_blocks/blocks - 1)))  || case when user_stats = 'YES' then ' caution: user-defined stats!' when stale_stats = 'YES' then ' caution: stale stats!' end "stats",
--        stale_stats,
--        user_stats,        
        case when cost_leakage is null then notes else leakage_reason end origin_of_inefficiency,
        case when stats_error>10 then 'caution: actual blocks differ from estimated blocks by ' || stats_error || ' %'  end || case when user_stats = 'YES' then '(user-defined stats!)' when stale_stats = 'YES' then '(stale stats!)' end comments,
        starts/*,
        repetition_ratio,
        filter_predicates*/
from summary s
where gets>0
--where cost_leakage > 0
order by cost_leakage desc nulls first;       
     
spool off
    
   

No comments:

Post a Comment

How to recovery PDB when PDB database is dropped in Oracle

  How to recovery PDB when PDB database is dropped :) [oracle@rac01 ~]$ sqlplus '/as sysdba' SQL*Plus: Release 21.0.0.0.0 - Product...