Search My Oracle Blog

Custom Search

05 June, 2011

Getting the right statistics

Continuing with the post titled "Nested Loops and Consistent Gets" earlier today, what is the Execution Plan with statistics on more columns ?
SQL> exec dbms_stats.gather_table_stats('','TRANSACTIONS',estimate_percent=>100,granularity=>'ALL',-
> method_opt=>'FOR COLUMNS COUNTRY_CD SIZE 250, PRODUCT_CD SIZE 250, TXN_ID SIZE 250 ',cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL>

SQL> select column_name, num_distinct, histogram from user_tab_col_statistics
2 where table_name = 'TRANSACTIONS'
3 order by 1;

COLUMN_NAME NUM_DISTINCT HISTOGRAM
------------------------------ ------------ ---------------
COUNTRY_CD 5 FREQUENCY
PRODUCT_CD 10 FREQUENCY
TXN_ID 1000000 HEIGHT BALANCED


SQL>

SQL> explain plan for
2 select product_desc, txn_id, transaction_amt
3 from transactions t, product_table p
4 where txn_date between to_date('01-FEB-2011','DD-MON-YYYY') and to_date('28-FEB-2011','DD-MON-YYYY')
5 and txn_id between 155000 and 156000
6 and country_cd = 'IN'
7 and t.product_cd=p.product_cd
8 /

Explained.

SQL> select * from table(dbms_xplan.display);


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1299935411

-------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 45 | 36 (0)| 00:00:01 | | |
| 1 | NESTED LOOPS | | | | | | | |
| 2 | NESTED LOOPS | | 1 | 45 | 36 (0)| 00:00:01 | | |
| 3 | PARTITION RANGE SINGLE | | 1 | 33 | 35 (0)| 00:00:01 | 2 | 2 |
|* 4 | TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTIONS | 1 | 33 | 35 (0)| 00:00:01 | 2 | 2 |
|* 5 | INDEX RANGE SCAN | TRANSACTIONS_NDX | 400 | | 4 (0)| 00:00:01 | 2 | 2 |
|* 6 | INDEX UNIQUE SCAN | SYS_C0016611 | 1 | | 0 (0)| 00:00:01 | | |
| 7 | TABLE ACCESS BY INDEX ROWID | PRODUCT_TABLE | 1 | 12 | 1 (0)| 00:00:01 | | |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("TXN_DATE"<=TO_DATE(' 2011-02-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 5 - access("COUNTRY_CD"='IN' AND "TXN_ID">=155000 AND "TXN_ID"<=156000) 6 - access("T"."PRODUCT_CD"="P"."PRODUCT_CD") 21 rows selected. SQL>

Ah. Now the optimizer has a better feel for the rows from Operation 5 : access by COUNTRY_CD and TXN_ID. Instead of 252 rows (when column statistics were not gathered on TXN_ID), the expected row count is 400.
Yet, the filter for TXN_DATE is still expected to return 1 row ! We know that we are reading 400 rows from the one partition alone(partition 2 being P_2011_FEB). We also know that all the rows in this partition *will* satisfy the criteria for this access predicate. Yet, the optimizer expects only 1 row.

Here's the actual execution with this plan :
SQL> select /*+ gather_plan_statistics */ product_desc, txn_id, transaction_amt
2 from transactions t, product_table p
3 where txn_date between to_date('01-FEB-2011','DD-MON-YYYY') and to_date('28-FEB-2011','DD-MON-YYYY')
4 and txn_id between 155000 and 156000
5 and country_cd = 'IN'
6 and t.product_cd=p.product_cd
7 /

PRODUCT_DESC TXN_ID TRANSACTION_AMT
-------------------------------------------------- ---------- ---------------
Prod: _1 155000 155
Prod: _2 155001 155.001
Prod: _5 155004 155.004
........
Prod: _5 155994 155.994
Prod: _9 155998 155.998
Prod: _1 156000 156

401 rows selected.

SQL>
SQL> select * from table(dbms_xplan.display_cursor('','','ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 5yzvvdg1y2aub, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ product_desc, txn_id,
transaction_amt from transactions t, product_table p where txn_date
between to_date('01-FEB-2011','DD-MON-YYYY') and
to_date('28-FEB-2011','DD-MON-YYYY') and txn_id between 155000 and
156000 and country_cd = 'IN' and t.product_cd=p.product_cd

Plan hash value: 1299935411

----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 401 |00:00:00.03 | 428 | 2 |
| 1 | NESTED LOOPS | | 1 | | 401 |00:00:00.03 | 428 | 2 |
| 2 | NESTED LOOPS | | 1 | 1 | 401 |00:00:00.04 | 27 | 1 |
| 3 | PARTITION RANGE SINGLE | | 1 | 1 | 401 |00:00:00.02 | 23 | 0 |
|* 4 | TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTIONS | 1 | 1 | 401 |00:00:00.02 | 23 | 0 |
|* 5 | INDEX RANGE SCAN | TRANSACTIONS_NDX | 1 | 400 | 401 |00:00:00.01 | 6 | 0 |
|* 6 | INDEX UNIQUE SCAN | SYS_C0016611 | 401 | 1 | 401 |00:00:00.01 | 4 | 1 |
| 7 | TABLE ACCESS BY INDEX ROWID | PRODUCT_TABLE | 401 | 1 | 401 |00:00:00.01 | 401 | 1 |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("TXN_DATE"<=TO_DATE(' 2011-02-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 5 - access("COUNTRY_CD"='IN' AND "TXN_ID">=155000 AND "TXN_ID"<=156000) 6 - access("T"."PRODUCT_CD"="P"."PRODUCT_CD") 30 rows selected. SQL>

So ! We are still reading the PRODUCTS_TABLE 400 times --- only because the Optimizer expected to have to read it only once !

Therefore, we can now narrow down to the determination of the row count for the filter predicate for TXN_DATE as being the cause of the "poor" behaviour of this Nested Loop, with a high Consistent Gets count.
The current set of statistics do not include a column where our query predicate is actually " txn_date between to_date('01-FEB-2011','DD-MON-YYYY') and to_date('28-FEB-2011','DD-MON-YYYY')" and the result is that the query is sub-optimal.


What if we include TXN_DATE in our statistics as well :
.......... we'll now see the Join change to a Sort-Merge join !
SQL> exec dbms_stats.gather_table_stats('','TRANSACTIONS',estimate_percent=>100,granularity=>'ALL',-
> method_opt=>'FOR COLUMNS COUNTRY_CD SIZE 250, PRODUCT_CD SIZE 250, TXN_ID SIZE 250, TXN_DATE SIZE 250 ',cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL>
SQL> select column_name, num_distinct, histogram from user_tab_col_statistics
2 where table_name = 'TRANSACTIONS'
3 order by 1;

COLUMN_NAME NUM_DISTINCT HISTOGRAM
------------------------------ ------------ ---------------
COUNTRY_CD 5 FREQUENCY
PRODUCT_CD 10 FREQUENCY
TXN_DATE 1000000 HEIGHT BALANCED
TXN_ID 1000000 HEIGHT BALANCED

SQL>
SQL> explain plan for
2 select product_desc, txn_id, transaction_amt
3 from transactions t, product_table p
4 where txn_date between to_date('01-FEB-2011','DD-MON-YYYY') and to_date('28-FEB-2011','DD-MON-YYYY')
5 and txn_id between 155000 and 156000
6 and country_cd = 'IN'
7 and t.product_cd=p.product_cd
8 /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3745290717

-------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 386 | 16984 | 38 (3)| 00:00:01 | | |
| 1 | MERGE JOIN | | 386 | 16984 | 38 (3)| 00:00:01 | | |
| 2 | TABLE ACCESS BY INDEX ROWID | PRODUCT_TABLE | 14 | 168 | 2 (0)| 00:00:01 | | |
| 3 | INDEX FULL SCAN | SYS_C0016611 | 14 | | 1 (0)| 00:00:01 | | |
|* 4 | SORT JOIN | | 386 | 12352 | 36 (3)| 00:00:01 | | |
| 5 | PARTITION RANGE SINGLE | | 386 | 12352 | 35 (0)| 00:00:01 | 2 | 2 |
|* 6 | TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTIONS | 386 | 12352 | 35 (0)| 00:00:01 | 2 | 2 |
|* 7 | INDEX RANGE SCAN | TRANSACTIONS_NDX | 400 | | 4 (0)| 00:00:01 | 2 | 2 |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T"."PRODUCT_CD"="P"."PRODUCT_CD")
filter("T"."PRODUCT_CD"="P"."PRODUCT_CD")
6 - filter("TXN_DATE"<=TO_DATE(' 2011-02-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 7 - access("COUNTRY_CD"='IN' AND "TXN_ID">=155000 AND "TXN_ID"<=156000) 22 rows selected. SQL>

It's a very different execution plan now. The Optimizer estimates 386 rows from TRANSACTIONS after filtering for the TXN_DATE. It now chooses to do a Sort-Merge join (instead of the Nested Loop), starting with a read of the PRODUCTS_TABLE.
The opitmizer will now avoid the "mistake" of reading the PRODUCTS_TABLE 41 times.
Here's the actual execution :
SQL> select /*+ gather_plan_statistics */ product_desc, txn_id, transaction_amt
2 from transactions t, product_table p
3 where txn_date between to_date('01-FEB-2011','DD-MON-YYYY') and to_date('28-FEB-2011','DD-MON-YYYY')
4 and txn_id between 155000 and 156000
5 and country_cd = 'IN'
6 and t.product_cd=p.product_cd
7 /

PRODUCT_DESC TXN_ID TRANSACTION_AMT
-------------------------------------------------- ---------- ---------------
Prod: _1 155000 155
Prod: _1 155010 155.01
Prod: _1 155020 155.02
........
Prod: _9 155618 155.618
Prod: _9 155628 155.628
Prod: _9 155638 155.638

401 rows selected.

SQL>
SQL> select * from table(dbms_xplan.display_cursor('','','ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 5yzvvdg1y2aub, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ product_desc, txn_id,
transaction_amt from transactions t, product_table p where txn_date
between to_date('01-FEB-2011','DD-MON-YYYY') and
to_date('28-FEB-2011','DD-MON-YYYY') and txn_id between 155000 and
156000 and country_cd = 'IN' and t.product_cd=p.product_cd

Plan hash value: 3745290717

------------------------------------------------------------------------------------------------------------------------------------
-------------------

| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem |
1Mem | Used-Mem |

------------------------------------------------------------------------------------------------------------------------------------
-------------------

| 0 | SELECT STATEMENT | | 1 | | 401 |00:00:00.01 | 25 | 2 | |
| |

| 1 | MERGE JOIN | | 1 | 386 | 401 |00:00:00.01 | 25 | 2 | |
| |

| 2 | TABLE ACCESS BY INDEX ROWID | PRODUCT_TABLE | 1 | 14 | 10 |00:00:00.01 | 4 | 2 | |
| |

| 3 | INDEX FULL SCAN | SYS_C0016611 | 1 | 14 | 10 |00:00:00.01 | 2 | 1 | |
| |

|* 4 | SORT JOIN | | 10 | 386 | 401 |00:00:00.01 | 21 | 0 | 18432 |
18432 |16384 (0)|

| 5 | PARTITION RANGE SINGLE | | 1 | 386 | 401 |00:00:00.01 | 21 | 0 | |
| |

|* 6 | TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTIONS | 1 | 386 | 401 |00:00:00.01 | 21 | 0 | |
| |

|* 7 | INDEX RANGE SCAN | TRANSACTIONS_NDX | 1 | 400 | 401 |00:00:00.01 | 5 | 0 | |
| |

------------------------------------------------------------------------------------------------------------------------------------
-------------------


Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T"."PRODUCT_CD"="P"."PRODUCT_CD")
filter("T"."PRODUCT_CD"="P"."PRODUCT_CD")
6 - filter("TXN_DATE"<=TO_DATE(' 2011-02-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 7 - access("COUNTRY_CD"='IN' AND "TXN_ID">=155000 AND "TXN_ID"<=156000) 31 rows selected. SQL>

Oracle now notes the usage of about 16KB memory to execute a sort on the values returned from the TRANSACTIONS table so as to be able to "merge" with the values from the PRODUCTS_TABLE (the join is on PRODUCT_CD).
The total number of Buffer Gets is now only 25 ! A very significant reduction from 428 in the earlier executions !
Also, note that a consequence of the Sort-Merge is that the final result set appears ordered by PRODUCT_DESC, actually on the basis of the sort on PRODUCT_CD to execute the join.

Therefore, not only can a change in statistics :
a. Change the execution plan
and
b. Change the total logical i/o
It can also change the manner in which the result set appears ! This can be a very significant difference to users who see such listing on screens / webforms.
Hint: Always explicitly specify an ORDER BY if the result set exceeds a few rows when the appearance of the output is "sensitive".
Because I didn't include an ORDER BY the rows appear differently, merely because of a slight change in the Gather_Stats.

.
.
.

No comments:

Aggregated by orafaq.com

Aggregated by orafaq.com
This blog is being aggregated by orafaq.com

Top 50 Oracle SQL Blogs 2016

Top 50 Oracle SQL Blogs 2016
Top 50 Oracle SQL Blogs 2016