Hemant's Oracle DBA Blog: Getting the right statistics

Continuing with the post titled "Nested Loops and Consistent Gets" earlier today, what is the Execution Plan with statistics on more columns ?

SQL> exec dbms_stats.gather_table_stats('','TRANSACTIONS',estimate_percent=>100,granularity=>'ALL',-
> method_opt=>'FOR COLUMNS COUNTRY_CD SIZE 250, PRODUCT_CD SIZE 250, TXN_ID SIZE 250 ',cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL>

SQL> select column_name, num_distinct, histogram from user_tab_col_statistics
2  where table_name = 'TRANSACTIONS'
3  order by 1;

COLUMN_NAME                    NUM_DISTINCT HISTOGRAM
------------------------------ ------------ ---------------
COUNTRY_CD                                5 FREQUENCY
PRODUCT_CD                               10 FREQUENCY
TXN_ID                              1000000 HEIGHT BALANCED


SQL>

SQL> explain plan for
2  select  product_desc, txn_id, transaction_amt
3  from transactions t, product_table p
4  where txn_date between to_date('01-FEB-2011','DD-MON-YYYY') and to_date('28-FEB-2011','DD-MON-YYYY')
5  and txn_id between 155000 and 156000
6  and country_cd = 'IN'
7  and t.product_cd=p.product_cd
8  /

Explained.

SQL> select * from table(dbms_xplan.display);


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1299935411
                                                                                                                 
-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |     1 |    45 |    36   (0)| 00:00:01 |       |       |
|   1 |  NESTED LOOPS                        |                  |       |       |            |          |       |       |
|   2 |   NESTED LOOPS                       |                  |     1 |    45 |    36   (0)| 00:00:01 |       |       |
|   3 |    PARTITION RANGE SINGLE            |                  |     1 |    33 |    35   (0)| 00:00:01 |     2 |     2 |
|*  4 |     TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTIONS     |     1 |    33 |    35   (0)| 00:00:01 |     2 |     2 |
|*  5 |      INDEX RANGE SCAN                | TRANSACTIONS_NDX |   400 |       |     4   (0)| 00:00:01 |     2 |     2 |
|*  6 |    INDEX UNIQUE SCAN                 | SYS_C0016611     |     1 |       |     0   (0)| 00:00:01 |       |       |
|   7 |   TABLE ACCESS BY INDEX ROWID        | PRODUCT_TABLE    |     1 |    12 |     1   (0)| 00:00:01 |       |       |
-------------------------------------------------------------------------------------------------------------------------
                                           
Predicate Information (identified by operation id):
---------------------------------------------------
                                                                            
4 - filter("TXN_DATE"<=TO_DATE(' 2011-02-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))    5 - access("COUNTRY_CD"='IN' AND "TXN_ID">=155000 AND "TXN_ID"<=156000)    6 - access("T"."PRODUCT_CD"="P"."PRODUCT_CD")  21 rows selected.  SQL>

Ah. Now the optimizer has a better feel for the rows from Operation 5 : access by COUNTRY_CD and TXN_ID. Instead of 252 rows (when column statistics were not gathered on TXN_ID), the expected row count is 400.
Yet, the filter for TXN_DATE is still expected to return 1 row ! We know that we are reading 400 rows from the one partition alone(partition 2 being P_2011_FEB). We also know that all the rows in this partition *will* satisfy the criteria for this access predicate. Yet, the optimizer expects only 1 row.

Here's the actual execution with this plan :

SQL> select /*+ gather_plan_statistics */ product_desc, txn_id, transaction_amt
2  from transactions t, product_table p
3  where txn_date between to_date('01-FEB-2011','DD-MON-YYYY') and to_date('28-FEB-2011','DD-MON-YYYY')
4  and txn_id between 155000 and 156000
5  and country_cd = 'IN'
6  and t.product_cd=p.product_cd
7  /

PRODUCT_DESC                                           TXN_ID TRANSACTION_AMT
-------------------------------------------------- ---------- ---------------
Prod: _1                                               155000             155
Prod: _2                                               155001         155.001
Prod: _5                                               155004         155.004
........
Prod: _5                                               155994         155.994
Prod: _9                                               155998         155.998
Prod: _1                                               156000             156

401 rows selected.

SQL>
SQL> select * from table(dbms_xplan.display_cursor('','','ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  5yzvvdg1y2aub, child number 0                                                                                           
-------------------------------------
select /*+ gather_plan_statistics */ product_desc, txn_id,
transaction_amt from transactions t, product_table p where txn_date
between to_date('01-FEB-2011','DD-MON-YYYY') and
to_date('28-FEB-2011','DD-MON-YYYY') and txn_id between 155000 and
156000 and country_cd = 'IN' and t.product_cd=p.product_cd

Plan hash value: 1299935411

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |      1 |        |    401 |00:00:00.03 |     428 |      2 |
|   1 |  NESTED LOOPS                        |                  |      1 |        |    401 |00:00:00.03 |     428 |      2 |
|   2 |   NESTED LOOPS                       |                  |      1 |      1 |    401 |00:00:00.04 |      27 |      1 |
|   3 |    PARTITION RANGE SINGLE            |                  |      1 |      1 |    401 |00:00:00.02 |      23 |      0 |
|*  4 |     TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTIONS     |      1 |      1 |    401 |00:00:00.02 |      23 |      0 |
|*  5 |      INDEX RANGE SCAN                | TRANSACTIONS_NDX |      1 |    400 |    401 |00:00:00.01 |       6 |      0 |
|*  6 |    INDEX UNIQUE SCAN                 | SYS_C0016611     |    401 |      1 |    401 |00:00:00.01 |       4 |      1 |
|   7 |   TABLE ACCESS BY INDEX ROWID        | PRODUCT_TABLE    |    401 |      1 |    401 |00:00:00.01 |     401 |      1 |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("TXN_DATE"<=TO_DATE(' 2011-02-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))    5 - access("COUNTRY_CD"='IN' AND "TXN_ID">=155000 AND "TXN_ID"<=156000)    6 - access("T"."PRODUCT_CD"="P"."PRODUCT_CD")   30 rows selected.  SQL>

So ! We are still reading the PRODUCTS_TABLE 400 times --- only because the Optimizer expected to have to read it only once !

Therefore, we can now narrow down to the determination of the row count for the filter predicate for TXN_DATE as being the cause of the "poor" behaviour of this Nested Loop, with a high Consistent Gets count.
The current set of statistics do not include a column where our query predicate is actually " txn_date between to_date('01-FEB-2011','DD-MON-YYYY') and to_date('28-FEB-2011','DD-MON-YYYY')" and the result is that the query is sub-optimal.

What if we include TXN_DATE in our statistics as well :

.......... we'll now see the Join change to a Sort-Merge join !

SQL> exec dbms_stats.gather_table_stats('','TRANSACTIONS',estimate_percent=>100,granularity=>'ALL',-
> method_opt=>'FOR COLUMNS COUNTRY_CD SIZE 250, PRODUCT_CD SIZE 250, TXN_ID SIZE 250, TXN_DATE SIZE 250 ',cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL>
SQL> select column_name, num_distinct, histogram from user_tab_col_statistics
2  where table_name = 'TRANSACTIONS'
3  order by 1;

COLUMN_NAME                    NUM_DISTINCT HISTOGRAM
------------------------------ ------------ ---------------
COUNTRY_CD                                5 FREQUENCY
PRODUCT_CD                               10 FREQUENCY
TXN_DATE                            1000000 HEIGHT BALANCED
TXN_ID                              1000000 HEIGHT BALANCED

SQL>
SQL> explain plan for
2  select  product_desc, txn_id, transaction_amt
3  from transactions t, product_table p
4  where txn_date between to_date('01-FEB-2011','DD-MON-YYYY') and to_date('28-FEB-2011','DD-MON-YYYY')
5  and txn_id between 155000 and 156000
6  and country_cd = 'IN'
7  and t.product_cd=p.product_cd
8  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3745290717

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                  |   386 | 16984 |    38   (3)| 00:00:01 |       |       |
|   1 |  MERGE JOIN                          |                  |   386 | 16984 |    38   (3)| 00:00:01 |       |       |
|   2 |   TABLE ACCESS BY INDEX ROWID        | PRODUCT_TABLE    |    14 |   168 |     2   (0)| 00:00:01 |       |       |
|   3 |    INDEX FULL SCAN                   | SYS_C0016611     |    14 |       |     1   (0)| 00:00:01 |       |       |
|*  4 |   SORT JOIN                          |                  |   386 | 12352 |    36   (3)| 00:00:01 |       |       |
|   5 |    PARTITION RANGE SINGLE            |                  |   386 | 12352 |    35   (0)| 00:00:01 |     2 |     2 |
|*  6 |     TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTIONS     |   386 | 12352 |    35   (0)| 00:00:01 |     2 |     2 |
|*  7 |      INDEX RANGE SCAN                | TRANSACTIONS_NDX |   400 |       |     4   (0)| 00:00:01 |     2 |     2 |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T"."PRODUCT_CD"="P"."PRODUCT_CD")
    filter("T"."PRODUCT_CD"="P"."PRODUCT_CD")
6 - filter("TXN_DATE"<=TO_DATE(' 2011-02-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))    7 - access("COUNTRY_CD"='IN' AND "TXN_ID">=155000 AND "TXN_ID"<=156000)  22 rows selected.  SQL>

It's a very different execution plan now. The Optimizer estimates 386 rows from TRANSACTIONS after filtering for the TXN_DATE. It now chooses to do a Sort-Merge join (instead of the Nested Loop), starting with a read of the PRODUCTS_TABLE.
The opitmizer will now avoid the "mistake" of reading the PRODUCTS_TABLE 41 times.
Here's the actual execution :

SQL> select /*+ gather_plan_statistics */ product_desc, txn_id, transaction_amt
2  from transactions t, product_table p
3  where txn_date between to_date('01-FEB-2011','DD-MON-YYYY') and to_date('28-FEB-2011','DD-MON-YYYY')
4  and txn_id between 155000 and 156000
5  and country_cd = 'IN'
6  and t.product_cd=p.product_cd
7  /

PRODUCT_DESC                                           TXN_ID TRANSACTION_AMT
-------------------------------------------------- ---------- ---------------
Prod: _1                                               155000             155
Prod: _1                                               155010          155.01
Prod: _1                                               155020          155.02
........
Prod: _9                                               155618         155.618                                                    
Prod: _9                                               155628         155.628
Prod: _9                                               155638         155.638

401 rows selected.

SQL>
SQL> select * from table(dbms_xplan.display_cursor('','','ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  5yzvvdg1y2aub, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ product_desc, txn_id,
transaction_amt from transactions t, product_table p where txn_date
between to_date('01-FEB-2011','DD-MON-YYYY') and
to_date('28-FEB-2011','DD-MON-YYYY') and txn_id between 155000 and
156000 and country_cd = 'IN' and t.product_cd=p.product_cd

Plan hash value: 3745290717

------------------------------------------------------------------------------------------------------------------------------------
-------------------

| Id  | Operation                            | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |
1Mem | Used-Mem |

------------------------------------------------------------------------------------------------------------------------------------
-------------------

|   0 | SELECT STATEMENT                     |                  |      1 |        |    401 |00:00:00.01 |      25 |      2 |       |
    |          |

|   1 |  MERGE JOIN                          |                  |      1 |    386 |    401 |00:00:00.01 |      25 |      2 |       |
    |          |

|   2 |   TABLE ACCESS BY INDEX ROWID        | PRODUCT_TABLE    |      1 |     14 |     10 |00:00:00.01 |       4 |      2 |       |
    |          |

|   3 |    INDEX FULL SCAN                   | SYS_C0016611     |      1 |     14 |     10 |00:00:00.01 |       2 |      1 |       |
    |          |

|*  4 |   SORT JOIN                          |                  |     10 |    386 |    401 |00:00:00.01 |      21 |      0 | 18432 |
18432 |16384  (0)|

|   5 |    PARTITION RANGE SINGLE            |                  |      1 |    386 |    401 |00:00:00.01 |      21 |      0 |       |
    |          |

|*  6 |     TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTIONS     |      1 |    386 |    401 |00:00:00.01 |      21 |      0 |       |
    |          |

|*  7 |      INDEX RANGE SCAN                | TRANSACTIONS_NDX |      1 |    400 |    401 |00:00:00.01 |       5 |      0 |       |
    |          |

------------------------------------------------------------------------------------------------------------------------------------
-------------------


Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T"."PRODUCT_CD"="P"."PRODUCT_CD")
    filter("T"."PRODUCT_CD"="P"."PRODUCT_CD")
6 - filter("TXN_DATE"<=TO_DATE(' 2011-02-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))    7 - access("COUNTRY_CD"='IN' AND "TXN_ID">=155000 AND "TXN_ID"<=156000)   31 rows selected.  SQL>

Oracle now notes the usage of about 16KB memory to execute a sort on the values returned from the TRANSACTIONS table so as to be able to "merge" with the values from the PRODUCTS_TABLE (the join is on PRODUCT_CD).
The total number of Buffer Gets is now only 25 ! A very significant reduction from 428 in the earlier executions !
Also, note that a consequence of the Sort-Merge is that the final result set appears ordered by PRODUCT_DESC, actually on the basis of the sort on PRODUCT_CD to execute the join.

Therefore, not only can a change in statistics :
a. Change the execution plan
and
b. Change the total logical i/o
It can also change the manner in which the result set appears ! This can be a very significant difference to users who see such listing on screens / webforms.
Hint: Always explicitly specify an ORDER BY if the result set exceeds a few rows when the appearance of the output is "sensitive".
Because I didn't include an ORDER BY the rows appear differently, merely because of a slight change in the Gather_Stats.

.
.
.

Hemant's Oracle DBA Blog

05 June, 2011

Getting the right statistics

No comments:

Aggregated by orafaq.com

OCP 12c