What happens if the nature of "data skew" changes over time ? For example, in the first few years of operations, you manufacture and sell 4 types of widgets ("A", "B", "C" and "D") where the volume of sales is approximately equally distributed amongst the 4 (Sales may be increasing over the years, but the distribution -- i.e. proportions -- remains the same). After say, 5 years, you add new widgets ("X", "Y" and "Z") to your product line. You notice that sales for "A" and "B" are now actually declining while sales for "Z" over the next 3 years are growing much faster than "C", "D", "X" and "Y".
What you have is changing data skew.
How does the optimizer estimate cardinality ?
Column Statistics used for a Histogram are at the *Table* level. A Histogram of the frequency of occurrences of these widgets in your sales would be computed merely as the number of occurrences across the table -- without accounting for dates. At the end of year 4, Oracle may say that "A" accounted for 25% of sales. At the end of year 8, "A" may now account for only 5% of total sales across all 8 years. And that is why the histogram would say. However, the histogram would not now (at the end of 8 years) say that "A" still accounts for 25% of sales in the first 4 years ! The information content is now "normalised" -- the metadata loses some vital information. (Of course, actually querying the table still shows you that "A" accounts for 25% of sales in the first 4 years, but the Optimizer's statistics can no longer reflect this knowledge). The Optimizer's cardinality estimates after 8 years are actually "weaker" because it has "normalised" (or "averaged out") the skew.
I've takan the "SH" schema's SALES table (this is available as one of the EXAMPLE schemas) from an Oracle 10.2 installation to demonstrate this. I take the example of 3.5inch diskettes whose sales decline. To simplify the case, I count the number of Sales Orders, not the quantities across all orders (for example, we could assume that each Sales Order is for exactly 1000 diskettes).
First, I start with SALES information for the years 1998 to 2002 :
We know that gross Sales (for all products) have (generally) been increasing over the years -- from 179thousand orders in 1998 to 259thousand in 2001 (with a slight dip in the year 2000).
However, sales of 3.5" diskettes declined in 2001.
Sales for 2002 are not recorded.
We look at the Optimizer's cardinality estimates based on available statistics :
We can see that the estimates for PROD_ID=31 do not match reality. Particularly for 2001.
Let's now create 2002 data :
Thus, although gross sales for all products did not decline significantly in 2002, sales for 3.5" diskettes, at 811 orders, were only 1/3rd the number in 2001.
Let's revisit the Optimizer's cardinality estimates :
Oracle now seems to have continued "normalising" (or "averaging out") the distribution across all the years. The discrepancy between the actual count of rows and the estimated count of rows has increased significantly.
Although the number of occurrences of Sales Orders for PROD_ID=31 has declined considerably in 2001 and 2002, the Optimizer cannot adjust for the decline. The discrepancy between Actual and Estimated row counts increases over time as the skew changes.
The Cardinality estimate is computed from Column statistics (number of distinct values, selectivity, histogram) at the Table level but combined with Partition row counts without any knowledge of the occurrences of PRODUCT_ID=31 in each of the target Partitions for the year 2002.
SUGGESTED TESTS For Readers :
1. What if we had a BTree Index ?
2. What if the SALES table wasn't Partitioned ?
UPDATE : See the next post, with a Non-Partitioned Table
.
.
.
What you have is changing data skew.
How does the optimizer estimate cardinality ?
Column Statistics used for a Histogram are at the *Table* level. A Histogram of the frequency of occurrences of these widgets in your sales would be computed merely as the number of occurrences across the table -- without accounting for dates. At the end of year 4, Oracle may say that "A" accounted for 25% of sales. At the end of year 8, "A" may now account for only 5% of total sales across all 8 years. And that is why the histogram would say. However, the histogram would not now (at the end of 8 years) say that "A" still accounts for 25% of sales in the first 4 years ! The information content is now "normalised" -- the metadata loses some vital information. (Of course, actually querying the table still shows you that "A" accounts for 25% of sales in the first 4 years, but the Optimizer's statistics can no longer reflect this knowledge). The Optimizer's cardinality estimates after 8 years are actually "weaker" because it has "normalised" (or "averaged out") the skew.
I've takan the "SH" schema's SALES table (this is available as one of the EXAMPLE schemas) from an Oracle 10.2 installation to demonstrate this. I take the example of 3.5inch diskettes whose sales decline. To simplify the case, I count the number of Sales Orders, not the quantities across all orders (for example, we could assume that each Sales Order is for exactly 1000 diskettes).
First, I start with SALES information for the years 1998 to 2002 :
SQL> exec dbms_stats.gather_table_stats('','SALES',estimate_percent=>100,granularity=>'ALL',degree=>4,- > method_opt=>'FOR ALL COLUMNS SIZE 250',cascade=>TRUE,no_invalidate=>FALSE); PL/SQL procedure successfully completed. SQL> SQL> -- What is PROD_ID 31 ? SQL> select prod_id, prod_name 2 from products 3 where prod_id=31 4 / PROD_ID PROD_NAME ---------- -------------------------------------------------- 31 1.44MB External 3.5" Diskette SQL> SQL> -- What are the range of dates in the Sales (history) table ? SQL> select min(time_id), max(time_id) 2 from times 3 / MIN(TIME_ MAX(TIME_ --------- --------- 01-JAN-98 31-DEC-02 SQL> SQL> -- What are the total number of Sales records by year ? SQL> select /*+ FULL(s) PARALLEL (s 4) */ to_char(time_id,'YYYY'), count(*) 2 from sales s 3 group by to_char(time_id,'YYYY') 4 order by 1 5 / TO_C COUNT(*) ---- ---------- 1998 178834 1999 247945 2000 232646 2001 259418 SQL> SQL> -- How many sales exist for the range 01-Jan-1998 to 31-Dec-2001 ? SQL> select count(*) 2 from sales 3 where time_id > to_date('31-DEC-1997','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY') 4 / COUNT(*) ---------- 918843 SQL> SQL> -- How many sales of 3.5" diskettes were made in the range 01-Jan-1998 to 31-Dec-2001 ? SQL> select prod_id, to_char(time_id,'YYYY'), count(*) 2 from sales 3 where 4 prod_id=31 5 and time_id > to_date('31-DEC-1997','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY') 6 group by prod_id,to_char(time_id,'YYYY') 7 order by 1,2 8 / PROD_ID TO_C COUNT(*) ---------- ---- ---------- 31 1998 6602 31 1999 6586 31 2000 7568 31 2001 2352 SQL> SQL> -- How many sales 3.5" diskettes were made in 2002 ? SQL> select prod_id, to_char(time_id,'YYYY'), count(*) 2 from sales 3 where 4 prod_id=31 5 and time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY') 6 group by prod_id,to_char(time_id,'YYYY') 7 order by 1,2 8 / no rows selected SQL>
We know that gross Sales (for all products) have (generally) been increasing over the years -- from 179thousand orders in 1998 to 259thousand in 2001 (with a slight dip in the year 2000).
However, sales of 3.5" diskettes declined in 2001.
Sales for 2002 are not recorded.
We look at the Optimizer's cardinality estimates based on available statistics :
SQL> -- Verify the Cardinality Estimates by year SQL> explain plan for 2 select prod_id 3 from sales 4 where prod_id=31 5 and time_id > to_date('31-DEC-1997','DD-MON-YYYY') and time_id < to_date('01-JAN-1999','DD-MON-YYYY') 6 / Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------ Plan hash value: 2009851720 ------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4478 | 53736 | 22 (10)| 00:00:01 | | | |* 1 | VIEW | index$_join$_001 | 4478 | 53736 | 22 (10)| 00:00:01 | | | |* 2 | HASH JOIN | | | | | | | | | 3 | PARTITION RANGE ITERATOR | | 4478 | 53736 | 6 (0)| 00:00:01 | 4 | 8 | | 4 | BITMAP CONVERSION TO ROWIDS| | 4478 | 53736 | 6 (0)| 00:00:01 | | | |* 5 | BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX | | | | | 4 | 8 | | 6 | PARTITION RANGE ITERATOR | | 4478 | 53736 | 15 (7)| 00:00:01 | 4 | 8 | | 7 | BITMAP CONVERSION TO ROWIDS| | 4478 | 53736 | 15 (7)| 00:00:01 | | | |* 8 | BITMAP INDEX RANGE SCAN | SALES_TIME_BIX | | | | | 4 | 8 | ------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("PROD_ID"=31 AND "TIME_ID""TIME_ID">TO_DATE(' 1997-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 2 - access(ROWID=ROWID) 5 - access("PROD_ID"=31) 8 - access("TIME_ID">TO_DATE(' 1997-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID" 1999-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 25 rows selected. SQL> SQL> SQL> explain plan for 2 select prod_id 3 from sales 4 where prod_id=31 5 and time_id > to_date('31-DEC-1998','DD-MON-YYYY') and time_id < to_date('01-JAN-2000','DD-MON-YYYY') 6 / Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------ Plan hash value: 2009851720 ------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 6275 | 75300 | 27 (8)| 00:00:01 | | | |* 1 | VIEW | index$_join$_001 | 6275 | 75300 | 27 (8)| 00:00:01 | | | |* 2 | HASH JOIN | | | | | | | | | 3 | PARTITION RANGE ITERATOR | | 6275 | 75300 | 6 (0)| 00:00:01 | 8 | 12 | | 4 | BITMAP CONVERSION TO ROWIDS| | 6275 | 75300 | 6 (0)| 00:00:01 | | | |* 5 | BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX | | | | | 8 | 12 | | 6 | PARTITION RANGE ITERATOR | | 6275 | 75300 | 20 (5)| 00:00:01 | 8 | 12 | | 7 | BITMAP CONVERSION TO ROWIDS| | 6275 | 75300 | 20 (5)| 00:00:01 | | | |* 8 | BITMAP INDEX RANGE SCAN | SALES_TIME_BIX | | | | | 8 | 12 | ------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("PROD_ID"=31 AND "TIME_ID" "TIME_ID">TO_DATE(' 1998-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 2 - access(ROWID=ROWID) 5 - access("PROD_ID"=31) 8 - access("TIME_ID">TO_DATE(' 1998-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID" 2000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 25 rows selected. SQL> SQL> SQL> explain plan for 2 select prod_id 3 from sales 4 where prod_id=31 5 and time_id > to_date('31-DEC-1999','DD-MON-YYYY') and time_id < to_date('01-JAN-2001','DD-MON-YYYY') 6 / Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------ Plan hash value: 2009851720 ------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5859 | 70308 | 25 (8)| 00:00:01 | | | |* 1 | VIEW | index$_join$_001 | 5859 | 70308 | 25 (8)| 00:00:01 | | | |* 2 | HASH JOIN | | | | | | | | | 3 | PARTITION RANGE ITERATOR | | 5859 | 70308 | 6 (0)| 00:00:01 | 12 | 16 | | 4 | BITMAP CONVERSION TO ROWIDS| | 5859 | 70308 | 6 (0)| 00:00:01 | | | |* 5 | BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX | | | | | 12 | 16 | | 6 | PARTITION RANGE ITERATOR | | 5859 | 70308 | 18 (6)| 00:00:01 | 12 | 16 | | 7 | BITMAP CONVERSION TO ROWIDS| | 5859 | 70308 | 18 (6)| 00:00:01 | | | |* 8 | BITMAP INDEX RANGE SCAN | SALES_TIME_BIX | | | | | 12 | 16 | ------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("PROD_ID"=31 AND "TIME_ID">TO_DATE(' 1999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID" 2 - access(ROWID=ROWID) 5 - access("PROD_ID"=31) 8 - access("TIME_ID">TO_DATE(' 1999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID" 2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 25 rows selected. SQL> SQL> SQL> explain plan for 2 select prod_id 3 from sales 4 where prod_id=31 5 and time_id > to_date('31-DEC-2000','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY') 6 / Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------ Plan hash value: 2009851720 ------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 6528 | 78336 | 27 (8)| 00:00:01 | | | |* 1 | VIEW | index$_join$_001 | 6528 | 78336 | 27 (8)| 00:00:01 | | | |* 2 | HASH JOIN | | | | | | | | | 3 | PARTITION RANGE ITERATOR | | 6528 | 78336 | 6 (0)| 00:00:01 | 16 | 20 | | 4 | BITMAP CONVERSION TO ROWIDS| | 6528 | 78336 | 6 (0)| 00:00:01 | | | |* 5 | BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX | | | | | 16 | 20 | | 6 | PARTITION RANGE ITERATOR | | 6528 | 78336 | 20 (5)| 00:00:01 | 16 | 20 | | 7 | BITMAP CONVERSION TO ROWIDS| | 6528 | 78336 | 20 (5)| 00:00:01 | | | |* 8 | BITMAP INDEX RANGE SCAN | SALES_TIME_BIX | | | | | 16 | 20 | ------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("PROD_ID"=31 AND "TIME_ID">TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID" 2 - access(ROWID=ROWID) 5 - access("PROD_ID"=31) 8 - access("TIME_ID">TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID" 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 25 rows selected. SQL> SQL> SQL> explain plan for 2 select prod_id 3 from sales 4 where prod_id=31 5 and time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY') 6 / Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------ Plan hash value: 1295169200 --------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | --------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 16 | 192 | 4 (0)| 00:00:01 | | | | 1 | PARTITION RANGE ITERATOR | | 16 | 192 | 4 (0)| 00:00:01 | 20 | 24 | | 2 | BITMAP CONVERSION TO ROWIDS| | 16 | 192 | 4 (0)| 00:00:01 | | | | 3 | BITMAP AND | | | | | | | | | 4 | BITMAP MERGE | | | | | | | | |* 5 | BITMAP INDEX RANGE SCAN | SALES_TIME_BIX | | | | | 20 | 24 | |* 6 | BITMAP INDEX SINGLE VALUE| SALES_PROD_BIX | | | | | 20 | 24 | --------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("TIME_ID">TO_DATE(' 2001-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID" 6 - access("PROD_ID"=31) 20 rows selected. SQL> SQL> SQL> explain plan for 2 select prod_id 3 from sales 4 where prod_id=31 5 and time_id > to_date('31-DEC-2002','DD-MON-YYYY') and time_id < to_date('01-JAN-2004','DD-MON-YYYY') 6 / Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------ Plan hash value: 2294783259 --------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | --------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 16 | 192 | 2 (0)| 00:00:01 | | | | 1 | PARTITION RANGE ITERATOR | | 16 | 192 | 2 (0)| 00:00:01 | 24 | 28 | |* 2 | TABLE ACCESS BY LOCAL INDEX ROWID| SALES | 16 | 192 | 2 (0)| 00:00:01 | 24 | 28 | | 3 | BITMAP CONVERSION TO ROWIDS | | | | | | | | |* 4 | BITMAP INDEX RANGE SCAN | SALES_TIME_BIX | | | | | 24 | 28 | --------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("PROD_ID"=31) 4 - access("TIME_ID">TO_DATE(' 2002-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID" 2004-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 18 rows selected. SQL> SQL>
We can see that the estimates for PROD_ID=31 do not match reality. Particularly for 2001.
Let's now create 2002 data :
SQL> -- let's create 2002 sales data SQL> alter index sales_time_bix modify partition sales_q1_2002 unusable; Index altered. SQL> alter index sales_time_bix modify partition sales_q2_2002 unusable; Index altered. SQL> alter index sales_time_bix modify partition sales_q3_2002 unusable; Index altered. SQL> alter index sales_time_bix modify partition sales_q4_2002 unusable; Index altered. SQL> alter index sales_prod_bix modify partition sales_q1_2002 unusable; Index altered. SQL> alter index sales_prod_bix modify partition sales_q2_2002 unusable; Index altered. SQL> alter index sales_prod_bix modify partition sales_q3_2002 unusable; Index altered. SQL> alter index sales_prod_bix modify partition sales_q4_2002 unusable; Index altered. SQL> SQL> SQL> insert /*+ APPEND */ into sales 2 select prod_id, cust_id, time_id+365, channel_id, promo_id, quantity_sold, amount_sold 3 from sales 4 where 1=1 5 and time_id > to_date('31-DEC-2000','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY') 6 and decode(prod_id,31,mod(cust_id,3),0)=0 7 / 257877 rows created. SQL> alter index sales_time_bix rebuild partition sales_q1_2002 ; Index altered. SQL> alter index sales_time_bix rebuild partition sales_q2_2002 ; Index altered. SQL> alter index sales_time_bix rebuild partition sales_q3_2002 ; Index altered. SQL> alter index sales_time_bix rebuild partition sales_q4_2002 ; Index altered. SQL> alter index sales_prod_bix rebuild partition sales_q1_2002 ; Index altered. SQL> alter index sales_prod_bix rebuild partition sales_q2_2002 ; Index altered. SQL> alter index sales_prod_bix rebuild partition sales_q3_2002 ; Index altered. SQL> alter index sales_prod_bix rebuild partition sales_q4_2002 ; Index altered. SQL> SQL> SQL> -- the number of total sales orders in 2002 SQL> select count(*) 2 from sales 3 where time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY') 4 / COUNT(*) ---------- 257877 SQL> SQL> -- the number of sales orders for 3.5" diskettes in 2002 SQL> select count(*) 2 from sales 3 where time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY') 4 and prod_id=31 5 / COUNT(*) ---------- 811 SQL> -- so we had only 811 sales orders for 3.5" diskettes SQL> SQL> -- Gather Statistics SQL> exec dbms_stats.gather_table_stats('','SALES',estimate_percent=>100,granularity=>'ALL',degree=>4,- > method_opt=>'FOR ALL COLUMNS SIZE 250',cascade=>TRUE,no_invalidate=>FALSE); PL/SQL procedure successfully completed. SQL>
Thus, although gross sales for all products did not decline significantly in 2002, sales for 3.5" diskettes, at 811 orders, were only 1/3rd the number in 2001.
Let's revisit the Optimizer's cardinality estimates :
SQL> -- ReVerify the Cardinality Estimates by year SQL> explain plan for 2 select prod_id 3 from sales 4 where prod_id=31 5 and time_id > to_date('31-DEC-1997','DD-MON-YYYY') and time_id < to_date('01-JAN-1999','DD-MON-YYYY') 6 / Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------ Plan hash value: 2009851720 ------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3636 | 43632 | 22 (10)| 00:00:01 | | | |* 1 | VIEW | index$_join$_001 | 3636 | 43632 | 22 (10)| 00:00:01 | | | |* 2 | HASH JOIN | | | | | | | | | 3 | PARTITION RANGE ITERATOR | | 3636 | 43632 | 6 (0)| 00:00:01 | 4 | 8 | | 4 | BITMAP CONVERSION TO ROWIDS| | 3636 | 43632 | 6 (0)| 00:00:01 | | | |* 5 | BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX | | | | | 4 | 8 | | 6 | PARTITION RANGE ITERATOR | | 3636 | 43632 | 15 (7)| 00:00:01 | 4 | 8 | | 7 | BITMAP CONVERSION TO ROWIDS| | 3636 | 43632 | 15 (7)| 00:00:01 | | | |* 8 | BITMAP INDEX RANGE SCAN | SALES_TIME_BIX | | | | | 4 | 8 | ------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("PROD_ID"=31 AND "TIME_ID""TIME_ID">TO_DATE(' 1997-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 2 - access(ROWID=ROWID) 5 - access("PROD_ID"=31) 8 - access("TIME_ID">TO_DATE(' 1997-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID" 1999-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 25 rows selected. SQL> SQL> SQL> explain plan for 2 select prod_id 3 from sales 4 where prod_id=31 5 and time_id > to_date('31-DEC-1998','DD-MON-YYYY') and time_id < to_date('01-JAN-2000','DD-MON-YYYY') 6 / Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------ Plan hash value: 2009851720 ------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5047 | 60564 | 27 (8)| 00:00:01 | | | |* 1 | VIEW | index$_join$_001 | 5047 | 60564 | 27 (8)| 00:00:01 | | | |* 2 | HASH JOIN | | | | | | | | | 3 | PARTITION RANGE ITERATOR | | 5047 | 60564 | 6 (0)| 00:00:01 | 8 | 12 | | 4 | BITMAP CONVERSION TO ROWIDS| | 5047 | 60564 | 6 (0)| 00:00:01 | | | |* 5 | BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX | | | | | 8 | 12 | | 6 | PARTITION RANGE ITERATOR | | 5047 | 60564 | 20 (5)| 00:00:01 | 8 | 12 | | 7 | BITMAP CONVERSION TO ROWIDS| | 5047 | 60564 | 20 (5)| 00:00:01 | | | |* 8 | BITMAP INDEX RANGE SCAN | SALES_TIME_BIX | | | | | 8 | 12 | ------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("PROD_ID"=31 AND "TIME_ID" "TIME_ID">TO_DATE(' 1998-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 2 - access(ROWID=ROWID) 5 - access("PROD_ID"=31) 8 - access("TIME_ID">TO_DATE(' 1998-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID" 2000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 25 rows selected. SQL> SQL> SQL> explain plan for 2 select prod_id 3 from sales 4 where prod_id=31 5 and time_id > to_date('31-DEC-1999','DD-MON-YYYY') and time_id < to_date('01-JAN-2001','DD-MON-YYYY') 6 / Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------ Plan hash value: 2009851720 ------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4755 | 57060 | 26 (8)| 00:00:01 | | | |* 1 | VIEW | index$_join$_001 | 4755 | 57060 | 26 (8)| 00:00:01 | | | |* 2 | HASH JOIN | | | | | | | | | 3 | PARTITION RANGE ITERATOR | | 4755 | 57060 | 6 (0)| 00:00:01 | 12 | 16 | | 4 | BITMAP CONVERSION TO ROWIDS| | 4755 | 57060 | 6 (0)| 00:00:01 | | | |* 5 | BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX | | | | | 12 | 16 | | 6 | PARTITION RANGE ITERATOR | | 4755 | 57060 | 19 (6)| 00:00:01 | 12 | 16 | | 7 | BITMAP CONVERSION TO ROWIDS| | 4755 | 57060 | 19 (6)| 00:00:01 | | | |* 8 | BITMAP INDEX RANGE SCAN | SALES_TIME_BIX | | | | | 12 | 16 | ------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("PROD_ID"=31 AND "TIME_ID" "TIME_ID">TO_DATE(' 1999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 2 - access(ROWID=ROWID) 5 - access("PROD_ID"=31) 8 - access("TIME_ID">TO_DATE(' 1999-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID" 2001-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 25 rows selected. SQL> SQL> SQL> explain plan for 2 select prod_id 3 from sales 4 where prod_id=31 5 and time_id > to_date('31-DEC-2000','DD-MON-YYYY') and time_id < to_date('01-JAN-2002','DD-MON-YYYY') 6 / Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------ Plan hash value: 2009851720 ------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5277 | 63324 | 28 (8)| 00:00:01 | | | |* 1 | VIEW | index$_join$_001 | 5277 | 63324 | 28 (8)| 00:00:01 | | | |* 2 | HASH JOIN | | | | | | | | | 3 | PARTITION RANGE ITERATOR | | 5277 | 63324 | 6 (0)| 00:00:01 | 16 | 20 | | 4 | BITMAP CONVERSION TO ROWIDS| | 5277 | 63324 | 6 (0)| 00:00:01 | | | |* 5 | BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX | | | | | 16 | 20 | | 6 | PARTITION RANGE ITERATOR | | 5277 | 63324 | 21 (5)| 00:00:01 | 16 | 20 | | 7 | BITMAP CONVERSION TO ROWIDS| | 5277 | 63324 | 21 (5)| 00:00:01 | | | |* 8 | BITMAP INDEX RANGE SCAN | SALES_TIME_BIX | | | | | 16 | 20 | ------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("PROD_ID"=31 AND "TIME_ID">TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID" 2 - access(ROWID=ROWID) 5 - access("PROD_ID"=31) 8 - access("TIME_ID">TO_DATE(' 2000-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID" 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 25 rows selected. SQL> SQL> SQL> explain plan for 2 select prod_id 3 from sales 4 where prod_id=31 5 and time_id > to_date('31-DEC-2001','DD-MON-YYYY') and time_id < to_date('01-JAN-2003','DD-MON-YYYY') 6 / Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------ Plan hash value: 2009851720 ------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5250 | 63000 | 28 (8)| 00:00:01 | | | |* 1 | VIEW | index$_join$_001 | 5250 | 63000 | 28 (8)| 00:00:01 | | | |* 2 | HASH JOIN | | | | | | | | | 3 | PARTITION RANGE ITERATOR | | 5250 | 63000 | 6 (0)| 00:00:01 | 20 | 24 | | 4 | BITMAP CONVERSION TO ROWIDS| | 5250 | 63000 | 6 (0)| 00:00:01 | | | |* 5 | BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX | | | | | 20 | 24 | | 6 | PARTITION RANGE ITERATOR | | 5250 | 63000 | 21 (5)| 00:00:01 | 20 | 24 | | 7 | BITMAP CONVERSION TO ROWIDS| | 5250 | 63000 | 21 (5)| 00:00:01 | | | |* 8 | BITMAP INDEX RANGE SCAN | SALES_TIME_BIX | | | | | 20 | 24 | ------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("PROD_ID"=31 AND "TIME_ID">TO_DATE(' 2001-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID" 2 - access(ROWID=ROWID) 5 - access("PROD_ID"=31) 8 - access("TIME_ID">TO_DATE(' 2001-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID" 2003-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 25 rows selected. SQL> SQL> SQL> explain plan for 2 select prod_id 3 from sales 4 where prod_id=31 5 and time_id > to_date('31-DEC-2002','DD-MON-YYYY') and time_id < to_date('01-JAN-2004','DD-MON-YYYY') 6 / Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------ Plan hash value: 1295169200 --------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | --------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 13 | 156 | 4 (0)| 00:00:01 | | | | 1 | PARTITION RANGE ITERATOR | | 13 | 156 | 4 (0)| 00:00:01 | 24 | 28 | | 2 | BITMAP CONVERSION TO ROWIDS| | 13 | 156 | 4 (0)| 00:00:01 | | | | 3 | BITMAP AND | | | | | | | | | 4 | BITMAP MERGE | | | | | | | | |* 5 | BITMAP INDEX RANGE SCAN | SALES_TIME_BIX | | | | | 24 | 28 | |* 6 | BITMAP INDEX SINGLE VALUE| SALES_PROD_BIX | | | | | 24 | 28 | --------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("TIME_ID">TO_DATE(' 2002-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TIME_ID" 6 - access("PROD_ID"=31) 20 rows selected. SQL>
Oracle now seems to have continued "normalising" (or "averaging out") the distribution across all the years. The discrepancy between the actual count of rows and the estimated count of rows has increased significantly.
Actual versus Estimated Cardinality for PROD_ID=31 Year Actual Estimated Estimated before inserting after inserting Year 2002 rows Year 2002 rows 1998 6602 4478 3636 1999 6586 6275 5047 2000 7568 5859 4755 2001 2352 6528 5277 2002 0 16 na 2002 inserted 811 na 5250 2003 0 16 13
Although the number of occurrences of Sales Orders for PROD_ID=31 has declined considerably in 2001 and 2002, the Optimizer cannot adjust for the decline. The discrepancy between Actual and Estimated row counts increases over time as the skew changes.
The Cardinality estimate is computed from Column statistics (number of distinct values, selectivity, histogram) at the Table level but combined with Partition row counts without any knowledge of the occurrences of PRODUCT_ID=31 in each of the target Partitions for the year 2002.
SUGGESTED TESTS For Readers :
1. What if we had a BTree Index ?
2. What if the SALES table wasn't Partitioned ?
UPDATE : See the next post, with a Non-Partitioned Table
.
.
.