SQL> exec dbms_stats.gather_table_stats('','TRANSACTIONS',estimate_percent=>100,granularity=>'ALL',-
> method_opt=>'FOR COLUMNS COUNTRY_CD SIZE 250, PRODUCT_CD SIZE 250, TXN_ID SIZE 250 ',cascade=>TRUE);
PL/SQL procedure successfully completed.
SQL>
SQL> select column_name, num_distinct, histogram from user_tab_col_statistics
2 where table_name = 'TRANSACTIONS'
3 order by 1;
COLUMN_NAME NUM_DISTINCT HISTOGRAM
------------------------------ ------------ ---------------
COUNTRY_CD 5 FREQUENCY
PRODUCT_CD 10 FREQUENCY
TXN_ID 1000000 HEIGHT BALANCED
SQL>
SQL> explain plan for
2 select product_desc, txn_id, transaction_amt
3 from transactions t, product_table p
4 where txn_date between to_date('01-FEB-2011','DD-MON-YYYY') and to_date('28-FEB-2011','DD-MON-YYYY')
5 and txn_id between 155000 and 156000
6 and country_cd = 'IN'
7 and t.product_cd=p.product_cd
8 /
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1299935411
-------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 45 | 36 (0)| 00:00:01 | | |
| 1 | NESTED LOOPS | | | | | | | |
| 2 | NESTED LOOPS | | 1 | 45 | 36 (0)| 00:00:01 | | |
| 3 | PARTITION RANGE SINGLE | | 1 | 33 | 35 (0)| 00:00:01 | 2 | 2 |
|* 4 | TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTIONS | 1 | 33 | 35 (0)| 00:00:01 | 2 | 2 |
|* 5 | INDEX RANGE SCAN | TRANSACTIONS_NDX | 400 | | 4 (0)| 00:00:01 | 2 | 2 |
|* 6 | INDEX UNIQUE SCAN | SYS_C0016611 | 1 | | 0 (0)| 00:00:01 | | |
| 7 | TABLE ACCESS BY INDEX ROWID | PRODUCT_TABLE | 1 | 12 | 1 (0)| 00:00:01 | | |
-------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("TXN_DATE"<=TO_DATE(' 2011-02-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 5 - access("COUNTRY_CD"='IN' AND "TXN_ID">=155000 AND "TXN_ID"<=156000) 6 - access("T"."PRODUCT_CD"="P"."PRODUCT_CD") 21 rows selected. SQL>
Ah. Now the optimizer has a better feel for the rows from Operation 5 : access by COUNTRY_CD and TXN_ID. Instead of 252 rows (when column statistics were not gathered on TXN_ID), the expected row count is 400.
Yet, the filter for TXN_DATE is still expected to return 1 row ! We know that we are reading 400 rows from the one partition alone(partition 2 being P_2011_FEB). We also know that all the rows in this partition *will* satisfy the criteria for this access predicate. Yet, the optimizer expects only 1 row.
Here's the actual execution with this plan :
SQL> select /*+ gather_plan_statistics */ product_desc, txn_id, transaction_amt
2 from transactions t, product_table p
3 where txn_date between to_date('01-FEB-2011','DD-MON-YYYY') and to_date('28-FEB-2011','DD-MON-YYYY')
4 and txn_id between 155000 and 156000
5 and country_cd = 'IN'
6 and t.product_cd=p.product_cd
7 /
PRODUCT_DESC TXN_ID TRANSACTION_AMT
-------------------------------------------------- ---------- ---------------
Prod: _1 155000 155
Prod: _2 155001 155.001
Prod: _5 155004 155.004
........
Prod: _5 155994 155.994
Prod: _9 155998 155.998
Prod: _1 156000 156
401 rows selected.
SQL>
SQL> select * from table(dbms_xplan.display_cursor('','','ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 5yzvvdg1y2aub, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ product_desc, txn_id,
transaction_amt from transactions t, product_table p where txn_date
between to_date('01-FEB-2011','DD-MON-YYYY') and
to_date('28-FEB-2011','DD-MON-YYYY') and txn_id between 155000 and
156000 and country_cd = 'IN' and t.product_cd=p.product_cd
Plan hash value: 1299935411
----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 401 |00:00:00.03 | 428 | 2 |
| 1 | NESTED LOOPS | | 1 | | 401 |00:00:00.03 | 428 | 2 |
| 2 | NESTED LOOPS | | 1 | 1 | 401 |00:00:00.04 | 27 | 1 |
| 3 | PARTITION RANGE SINGLE | | 1 | 1 | 401 |00:00:00.02 | 23 | 0 |
|* 4 | TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTIONS | 1 | 1 | 401 |00:00:00.02 | 23 | 0 |
|* 5 | INDEX RANGE SCAN | TRANSACTIONS_NDX | 1 | 400 | 401 |00:00:00.01 | 6 | 0 |
|* 6 | INDEX UNIQUE SCAN | SYS_C0016611 | 401 | 1 | 401 |00:00:00.01 | 4 | 1 |
| 7 | TABLE ACCESS BY INDEX ROWID | PRODUCT_TABLE | 401 | 1 | 401 |00:00:00.01 | 401 | 1 |
----------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("TXN_DATE"<=TO_DATE(' 2011-02-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 5 - access("COUNTRY_CD"='IN' AND "TXN_ID">=155000 AND "TXN_ID"<=156000) 6 - access("T"."PRODUCT_CD"="P"."PRODUCT_CD") 30 rows selected. SQL>
So ! We are still reading the PRODUCTS_TABLE 400 times --- only because the Optimizer expected to have to read it only once !
Therefore, we can now narrow down to the determination of the row count for the filter predicate for TXN_DATE as being the cause of the "poor" behaviour of this Nested Loop, with a high Consistent Gets count.
The current set of statistics do not include a column where our query predicate is actually " txn_date between to_date('01-FEB-2011','DD-MON-YYYY') and to_date('28-FEB-2011','DD-MON-YYYY')" and the result is that the query is sub-optimal.
What if we include TXN_DATE in our statistics as well :
.......... we'll now see the Join change to a Sort-Merge join !
It's a very different execution plan now. The Optimizer estimates 386 rows from TRANSACTIONS after filtering for the TXN_DATE. It now chooses to do a Sort-Merge join (instead of the Nested Loop), starting with a read of the PRODUCTS_TABLE.
The opitmizer will now avoid the "mistake" of reading the PRODUCTS_TABLE 41 times.
Here's the actual execution :
Oracle now notes the usage of about 16KB memory to execute a sort on the values returned from the TRANSACTIONS table so as to be able to "merge" with the values from the PRODUCTS_TABLE (the join is on PRODUCT_CD).
The total number of Buffer Gets is now only 25 ! A very significant reduction from 428 in the earlier executions !
Also, note that a consequence of the Sort-Merge is that the final result set appears ordered by PRODUCT_DESC, actually on the basis of the sort on PRODUCT_CD to execute the join.
Therefore, not only can a change in statistics :
a. Change the execution plan
and
b. Change the total logical i/o
It can also change the manner in which the result set appears ! This can be a very significant difference to users who see such listing on screens / webforms.
Hint: Always explicitly specify an ORDER BY if the result set exceeds a few rows when the appearance of the output is "sensitive".
Because I didn't include an ORDER BY the rows appear differently, merely because of a slight change in the Gather_Stats.
.
.
.
SQL> exec dbms_stats.gather_table_stats('','TRANSACTIONS',estimate_percent=>100,granularity=>'ALL',-
> method_opt=>'FOR COLUMNS COUNTRY_CD SIZE 250, PRODUCT_CD SIZE 250, TXN_ID SIZE 250, TXN_DATE SIZE 250 ',cascade=>TRUE);
PL/SQL procedure successfully completed.
SQL>
SQL> select column_name, num_distinct, histogram from user_tab_col_statistics
2 where table_name = 'TRANSACTIONS'
3 order by 1;
COLUMN_NAME NUM_DISTINCT HISTOGRAM
------------------------------ ------------ ---------------
COUNTRY_CD 5 FREQUENCY
PRODUCT_CD 10 FREQUENCY
TXN_DATE 1000000 HEIGHT BALANCED
TXN_ID 1000000 HEIGHT BALANCED
SQL>
SQL> explain plan for
2 select product_desc, txn_id, transaction_amt
3 from transactions t, product_table p
4 where txn_date between to_date('01-FEB-2011','DD-MON-YYYY') and to_date('28-FEB-2011','DD-MON-YYYY')
5 and txn_id between 155000 and 156000
6 and country_cd = 'IN'
7 and t.product_cd=p.product_cd
8 /
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3745290717
-------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 386 | 16984 | 38 (3)| 00:00:01 | | |
| 1 | MERGE JOIN | | 386 | 16984 | 38 (3)| 00:00:01 | | |
| 2 | TABLE ACCESS BY INDEX ROWID | PRODUCT_TABLE | 14 | 168 | 2 (0)| 00:00:01 | | |
| 3 | INDEX FULL SCAN | SYS_C0016611 | 14 | | 1 (0)| 00:00:01 | | |
|* 4 | SORT JOIN | | 386 | 12352 | 36 (3)| 00:00:01 | | |
| 5 | PARTITION RANGE SINGLE | | 386 | 12352 | 35 (0)| 00:00:01 | 2 | 2 |
|* 6 | TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTIONS | 386 | 12352 | 35 (0)| 00:00:01 | 2 | 2 |
|* 7 | INDEX RANGE SCAN | TRANSACTIONS_NDX | 400 | | 4 (0)| 00:00:01 | 2 | 2 |
-------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T"."PRODUCT_CD"="P"."PRODUCT_CD")
filter("T"."PRODUCT_CD"="P"."PRODUCT_CD")
6 - filter("TXN_DATE"<=TO_DATE(' 2011-02-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 7 - access("COUNTRY_CD"='IN' AND "TXN_ID">=155000 AND "TXN_ID"<=156000) 22 rows selected. SQL>
It's a very different execution plan now. The Optimizer estimates 386 rows from TRANSACTIONS after filtering for the TXN_DATE. It now chooses to do a Sort-Merge join (instead of the Nested Loop), starting with a read of the PRODUCTS_TABLE.
The opitmizer will now avoid the "mistake" of reading the PRODUCTS_TABLE 41 times.
Here's the actual execution :
SQL> select /*+ gather_plan_statistics */ product_desc, txn_id, transaction_amt
2 from transactions t, product_table p
3 where txn_date between to_date('01-FEB-2011','DD-MON-YYYY') and to_date('28-FEB-2011','DD-MON-YYYY')
4 and txn_id between 155000 and 156000
5 and country_cd = 'IN'
6 and t.product_cd=p.product_cd
7 /
PRODUCT_DESC TXN_ID TRANSACTION_AMT
-------------------------------------------------- ---------- ---------------
Prod: _1 155000 155
Prod: _1 155010 155.01
Prod: _1 155020 155.02
........
Prod: _9 155618 155.618
Prod: _9 155628 155.628
Prod: _9 155638 155.638
401 rows selected.
SQL>
SQL> select * from table(dbms_xplan.display_cursor('','','ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 5yzvvdg1y2aub, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ product_desc, txn_id,
transaction_amt from transactions t, product_table p where txn_date
between to_date('01-FEB-2011','DD-MON-YYYY') and
to_date('28-FEB-2011','DD-MON-YYYY') and txn_id between 155000 and
156000 and country_cd = 'IN' and t.product_cd=p.product_cd
Plan hash value: 3745290717
------------------------------------------------------------------------------------------------------------------------------------
-------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem |
1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------
-------------------
| 0 | SELECT STATEMENT | | 1 | | 401 |00:00:00.01 | 25 | 2 | |
| |
| 1 | MERGE JOIN | | 1 | 386 | 401 |00:00:00.01 | 25 | 2 | |
| |
| 2 | TABLE ACCESS BY INDEX ROWID | PRODUCT_TABLE | 1 | 14 | 10 |00:00:00.01 | 4 | 2 | |
| |
| 3 | INDEX FULL SCAN | SYS_C0016611 | 1 | 14 | 10 |00:00:00.01 | 2 | 1 | |
| |
|* 4 | SORT JOIN | | 10 | 386 | 401 |00:00:00.01 | 21 | 0 | 18432 |
18432 |16384 (0)|
| 5 | PARTITION RANGE SINGLE | | 1 | 386 | 401 |00:00:00.01 | 21 | 0 | |
| |
|* 6 | TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTIONS | 1 | 386 | 401 |00:00:00.01 | 21 | 0 | |
| |
|* 7 | INDEX RANGE SCAN | TRANSACTIONS_NDX | 1 | 400 | 401 |00:00:00.01 | 5 | 0 | |
| |
------------------------------------------------------------------------------------------------------------------------------------
-------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T"."PRODUCT_CD"="P"."PRODUCT_CD")
filter("T"."PRODUCT_CD"="P"."PRODUCT_CD")
6 - filter("TXN_DATE"<=TO_DATE(' 2011-02-28 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 7 - access("COUNTRY_CD"='IN' AND "TXN_ID">=155000 AND "TXN_ID"<=156000) 31 rows selected. SQL>
Oracle now notes the usage of about 16KB memory to execute a sort on the values returned from the TRANSACTIONS table so as to be able to "merge" with the values from the PRODUCTS_TABLE (the join is on PRODUCT_CD).
The total number of Buffer Gets is now only 25 ! A very significant reduction from 428 in the earlier executions !
Also, note that a consequence of the Sort-Merge is that the final result set appears ordered by PRODUCT_DESC, actually on the basis of the sort on PRODUCT_CD to execute the join.
Therefore, not only can a change in statistics :
a. Change the execution plan
and
b. Change the total logical i/o
It can also change the manner in which the result set appears ! This can be a very significant difference to users who see such listing on screens / webforms.
Hint: Always explicitly specify an ORDER BY if the result set exceeds a few rows when the appearance of the output is "sensitive".
Because I didn't include an ORDER BY the rows appear differently, merely because of a slight change in the Gather_Stats.
.
.
.
No comments:
Post a Comment