Hemant's Oracle DBA Blog: Cardinality Estimates : Dependent Columns

NOTE : I am reposting the text of the case study (ie SQLs , Outputs and REMarks) as the previous posting had one "less than" sign resulting in blogger presenting a "challenging reading". 15-Jul-08

The Oracle Cost Based Optimizer uses table and column statistics to estimate the number of rows that will be fetched for given predicates in a query. This estimate is called the "Cardinality" and is presented as "CARD=" in a 9i Explain Plan or under the "Rows" Column in a 10g Explain Plan.

It is fairly obvious that the Optimizer uses the number of rows in a table and column selectivity (ie, the number of distinct values, and in the presence of histograms, the count of rows for each distinct value) to derive the expected Cardinality. Thus, the general advice to "ensure that you have fresh statistics" or "ensure that you have the right statistics" that is provided to DBAs.

For multi column predicates on the same table, the Optimizer derives the selectivity of the columns as "independent" columns and then multiplies them. Thus, the selectivity of columns (a,b) is selectivity_of_column_a X selectivity_of_column_b.
There are variants and complexities in this formula. The presence of indexes can also help the Optimizer estimate costs of different steps.

However, what the Optimizer, till 10gR2, cannot do is identify inter-column dependencies or correlations. Therefore, if "STATE" is one column and "COUNTRY" is another column in the same table, the Optimizer may well know the number of distinct STATEs and the number of distinct COUNTRYs in the table, but not how many and which STATEs are present for a given COUNTRY.

Here is a simple example with a fairly small table which has correlated columns. The same set of queries are executed twice, first without Histograms and the next round with Histograms on each of the columns. I've put in some remarks to explain how the Cardinality estimates that the Optimizer uses in deriving the Execution Plan can be very wrong when there are multiple columns and, particularly, more so when there exists high correlation between the Columns.


SQL> set pages600
SQL> set linesize132
SQL> set feedback off
SQL> set SQLPrompt ''

col Expected_Cardinality format 99,990 hea 'ExpCrd'
column parent_id_plus_exp    format 999
column id_plus_exp      format 990
column plan_plus_exp      format a90
column object_node_plus_exp  format a14
column other_plus_exp      format a90
column other_tag_plus_exp    format a29


REM To demonstrate how Oracle computes expected cardinality of two columns as being :
REM  (Number_of_Rows  X  Density_of_Column_1  X  Density_of_Column_2)
REM Oracle assumes a Uniform Distribution of values

REM The second set of tests are with Histograms and provide equally interesting results.
REM With histograms, Oracle is aware that there isn't a Uniform distribution for single columns but still isn't aware of multi column distributions

REM Only in 11g does Oracle introduce Column Groups for statistics




drop table test_cardinality ;
create table test_cardinality (
  2    date_of_birth   date,
  3    month_of_birth   varchar2(12),
  4    month_number   number,
  5    city_of_birth   varchar2(18),
  6    country_of_birth varchar2(3)
  7  )
  8  /

alter session set nls_date_format='DD-MON-RR';

insert into test_cardinality values ('01-JAN-60','JANUARY',1,'BOMBAY','IN');
insert into test_cardinality values ('01-FEB-60','FEBRUARY',2,'DELHI','IN');
insert into test_cardinality values ('01-MAR-60','MARCH',3,'CALCUTTA','IN');
insert into test_cardinality values ('01-APR-60','APRIL',4,'MADRAS','IN');
insert into test_cardinality values ('01-MAY-60','MAY',5,'NAGPUR','IN');
insert into test_cardinality values ('01-JUN-60','JUNE',6,'KANPUR','IN');
insert into test_cardinality values ('01-JUL-60','JULY',7,'KOLHAPUR','IN');
insert into test_cardinality values ('01-AUG-60','AUGUST',8,'SOLAPUR','IN');
insert into test_cardinality values ('01-SEP-60','SEPTEMBER',9,'JABALPUR','IN');
insert into test_cardinality values ('01-OCT-60','OCTOBER',10,'BADLAPUR','IN');
insert into test_cardinality values ('01-NOV-60','NOVEMBER',11,'LONDON','UK');
insert into test_cardinality values ('01-DEC-60','DECEMBER',12,'PARIS','FR');

commit;

alter table test_cardinality nologging;
insert /*+ APPEND */ into test_cardinality select * from test_cardinality ;
commit;
insert /*+ APPEND */ into test_cardinality select * from test_cardinality ;
commit;
insert /*+ APPEND */ into test_cardinality select * from test_cardinality ;
commit;
insert /*+ APPEND */ into test_cardinality select * from test_cardinality ;
commit;
insert /*+ APPEND */ into test_cardinality select * from test_cardinality ;
commit;
insert /*+ APPEND */ into test_cardinality select * from test_cardinality ;
commit;
insert /*+ APPEND */ into test_cardinality select * from test_cardinality ;
commit;

REM  We now have 128 copies of each row. ie, 128 'JANUARY's, 128 'DELHI's, 128 'UK's.  But 1536*10/12 INs !


exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'TEST_CARDINALITY',method_opt=>'FOR ALL COLUMNS SIZE 1',estimate_percent=>100);

select count(*) from test_cardinality;

  COUNT(*)                                                                                                                         
----------                                                                                                                         
      1536                                                                                                                         
select num_rows from user_tables where table_name = 'TEST_CARDINALITY';

  NUM_ROWS                                                                                                                         
----------                                                                                                                         
      1536                                                                                                                         
select column_name,sample_size, num_distinct, density from user_tab_columns where table_name = 'TEST_CARDINALITY' order by column_id;

COLUMN_NAME                    SAMPLE_SIZE NUM_DISTINCT    DENSITY                                                                 
------------------------------ ----------- ------------ ----------                                                                 
DATE_OF_BIRTH                         1536           12 .083333333                                                                 
MONTH_OF_BIRTH                        1536           12 .083333333                                                                 
MONTH_NUMBER                          1536           12 .083333333                                                                 
CITY_OF_BIRTH                         1536           12 .083333333                                                                 
COUNTRY_OF_BIRTH                      1536            3 .333333333                                                                 

REM The density for the first 4 columns is 1/12 while that for COUNTRY_OF_BIRTH is 1/3 (ie 4/12)

set autotrace on explain;

select count(*), round(1536*(1/12)*(1/12)) Expected_Cardinality from test_cardinality where month_number = 1 and month_of_birth = 'JANUARY';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
       128      11                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1     11      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1     11                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY     11    121      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("MONTH_NUMBER"=1 AND "MONTH_OF_BIRTH"='JANUARY')                                                                     

REM The expected cardinality is 11 rows although the real count is 128

select count(*), round(1536*(1/12)*(1/12)) Expected_Cardinality from test_cardinality where month_number = 1 and month_of_birth = 'DECEMBER';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
         0      11                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1     11      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1     11                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY     11    121      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("MONTH_NUMBER"=1 AND "MONTH_OF_BIRTH"='DECEMBER')                                                                    

REM  The Optimizer expected 11 rows.  It is not aware of the inter-column dependencies (there are no 'DECEMBER's being month 1 !)

select count(*), round(1536*(1/12)*(1/12)) Expected_Cardinality from test_cardinality where date_of_birth = '01-JAN-60' and month_of_birth = 'DECEMBER';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
         0      11                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1     16      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1     16                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY     11    176      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("MONTH_OF_BIRTH"='DECEMBER' AND "DATE_OF_BIRTH"='01-JAN-60')                                                         

REM  Once again, the Optimizer expected 11 rows.

select count(*), round(1536*(1/12)*(1/12)) Expected_Cardinality  from test_cardinality where to_char(date_of_birth,'MON') = 'JAN' and month_of_birth = 'JANUARY';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
       128      11                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1     16      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1     16                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY      1     16      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("MONTH_OF_BIRTH"='JANUARY' AND                                                                                       
              TO_CHAR(INTERNAL_FUNCTION("DATE_OF_BIRTH"),'MON')='JAN')                                                             

REM When a function ("to_char") is applied to a column, Oracle is unable to estimate the cardinality -- it comes up with "1"
rem   so, the Density_of_Column_1  X  Density_of_Column_2 formula breaks down

select count(*), round(1536*(1/12)*(4/12)) Expected_Cardinality from test_cardinality where city_of_birth = 'DELHI' and country_of_birth='IN';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
       128      43                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1     11      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1     11                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY     43    473      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("CITY_OF_BIRTH"='DELHI' AND "COUNTRY_OF_BIRTH"='IN')                                                                 

REM  Here, the density for country_of_birth is 4/12 so that is used in the calculation.

select count(*), round(1536*(1/12)*(4/12)) Expected_Cardinality from test_cardinality where city_of_birth = 'DELHI' and country_of_birth='UK';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
         0      43                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1     11      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1     11                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY     43    473      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("CITY_OF_BIRTH"='DELHI' AND "COUNTRY_OF_BIRTH"='UK')                                                                 

REM  Oracle is unaware that the intersect between the data sets for 'DELHI' and 'UK' is zero ! It just multiplies the two densities

select count(*), round(1536*(2/12)*(4/12)) Expected_Cardinality  from test_cardinality where city_of_birth in ('DELHI','LONDON') and country_of_birth='FR';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
         0      85                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1     11      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1     11                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY     85    935      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("COUNTRY_OF_BIRTH"='FR' AND ("CITY_OF_BIRTH"='DELHI' OR                                                              
              "CITY_OF_BIRTH"='LONDON'))                                                                                           

REM  The "error"  (actually, lack of knowledge about relationships between column values) is compounded further in this case !

select count(*), round(1536*(1/12)*(1/12)*(1/12)*(1/12)*(4/12)) Expected_Cardinality from test_cardinality where date_of_birth='01-JAN-60' and month_of_birth='JANUARY' and month_number = 1 and city_of_birth='BOMBAY' and country_of_birth='IN';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
       128       0                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1     28      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1     28                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY      1     28      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("MONTH_OF_BIRTH"='JANUARY' AND "MONTH_NUMBER"=1 AND                                                                  
              "CITY_OF_BIRTH"='BOMBAY' AND "COUNTRY_OF_BIRTH"='IN' AND                                                             
              "DATE_OF_BIRTH"='01-JAN-60')                                                                                         

REM  This really makes it worse ! We've taken the exact values for 1 in every 12 rows but the Optimizer multiplies the densities and comes up with less than 1
rem the Optimizer never presents a value of 0, but rounds it up to 1.  So the expected cardinality is presented as 1 (which is very wrong)

select count(*), round(1536*(4/12)) Expected_Cardinality from test_cardinality where country_of_birth='SG';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
         0     512                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1      3      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1      3                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY    512   1536      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("COUNTRY_OF_BIRTH"='SG')                                                                                             

REM  When we present a value that is not even present, Oracle isn't aware of this (Without histograms, it only knows MIN, MAX, Distinct and Density)
REM  The Optimizer still applies the same 4/12 density formula for country of birth and expects 512 'SG' rows !

set autotrace off


REM -----------------------------------------------------------------------------------------------------------
REM -----------------------------------------------------------------------------------------------------------
REM  What happens when DO gather Histograms ?  Can the Optimizer make use of Histograms on multiple columns ?!
REM -----------------------------------------------------------------------------------------------------------


exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'TEST_CARDINALITY',method_opt=>'FOR ALL COLUMNS SIZE 250',estimate_percent=>100);

select count(*) from test_cardinality;

  COUNT(*)                                                                                                                         
----------                                                                                                                         
      1536                                                                                                                         
select num_rows from user_tables where table_name = 'TEST_CARDINALITY';

  NUM_ROWS                                                                                                                         
----------                                                                                                                         
      1536                                                                                                                         
select column_name,sample_size, num_distinct, density from user_tab_columns where table_name = 'TEST_CARDINALITY' order by column_id;

COLUMN_NAME                    SAMPLE_SIZE NUM_DISTINCT    DENSITY                                                                 
------------------------------ ----------- ------------ ----------                                                                 
DATE_OF_BIRTH                         1536           12 .000325521                                                                 
MONTH_OF_BIRTH                        1536           12 .000325521                                                                 
MONTH_NUMBER                          1536           12 .000325521                                                                 
CITY_OF_BIRTH                         1536           12 .000325521                                                                 
COUNTRY_OF_BIRTH                      1536            3 .000325521                                                                 

REM With Histograms, the Density is actually presented differently.  We should be looking at USER_TAB_HISTOGRAMS for each column's values
REM Now, Oracle does know the exact count of rows for each of the column's range of values.
rem I haven't presented USER_TAB_HISTOGRAMS here

set autotrace on explain;

select count(*), round(1536*(1/12)*(1/12)) Expected_Cardinality from test_cardinality where month_number = 1 and month_of_birth = 'JANUARY';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
       128      11                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1     11      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1     11                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY     11    121      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("MONTH_NUMBER"=1 AND "MONTH_OF_BIRTH"='JANUARY')                                                                     

REM Although I seem to be using 'Density', I am actually using the exact count of rows.  There are 1/12 'month_number=1's.
rem   The Optimizer still comes up with a cardinality estimate of 11, based on Histograms, not Densities !

select count(*), round(1536*(1/12)*(1/12)) Expected_Cardinality from test_cardinality where month_number = 1 and month_of_birth = 'DECEMBER';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
         0      11                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1     11      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1     11                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY     11    121      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("MONTH_NUMBER"=1 AND "MONTH_OF_BIRTH"='DECEMBER')                                                                    

REM The Histograms say that there the frequency for 'month_number=1' is 1/12  and the frequency for 'DECEMBER' is also 1/12

select count(*), round(1536*(1/12)*(1/12)) Expected_Cardinality from test_cardinality where date_of_birth = '01-JAN-60' and month_of_birth = 'DECEMBER';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
         0      11                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1     16      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1     16                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY     11    176      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("MONTH_OF_BIRTH"='DECEMBER' AND "DATE_OF_BIRTH"='01-JAN-60')                                                         

REM The Histograms again present frequencies of 1/12 and 1/12

select count(*), round(1536*(1/12)*(1/12)) Expected_Cardinality  from test_cardinality where to_char(date_of_birth,'MON') = 'JAN' and month_of_birth = 'JANUARY';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
       128      11                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1     16      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1     16                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY      1     16      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("MONTH_OF_BIRTH"='JANUARY' AND                                                                                       
              TO_CHAR(INTERNAL_FUNCTION("DATE_OF_BIRTH"),'MON')='JAN')                                                             

REM  Once again, the usage of a function throws off the Optimizer !

select count(*), round(1536*(1/12)*(10/12)) Expected_Cardinality from test_cardinality where city_of_birth = 'DELHI' and country_of_birth='IN';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
       128     107                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1     11      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1     11                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY    107   1177      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("CITY_OF_BIRTH"='DELHI' AND "COUNTRY_OF_BIRTH"='IN')                                                                 

REM  The Histogram for 'IN' is a frequency of 10/12  *not*  the 4/12 that was assumed with uniform distribution. Yet, this is multiplied with 1/12 for 'DELHI'

select count(*), round(1536*(1/12)*(1/12)) Expected_Cardinality from test_cardinality where city_of_birth = 'DELHI' and country_of_birth='UK';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
         0      11                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1     11      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1     11                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY     11    121      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("COUNTRY_OF_BIRTH"='UK' AND "CITY_OF_BIRTH"='DELHI')                                                                 

REM  The expected frequencies for 'DELHI' and 'UK' are 1/12 and 1/12 respectively. No knowledge of the impossibility of the two data sets intersecting.

select count(*), round(1536*(2/12)*(1/12)) Expected_Cardinality  from test_cardinality where city_of_birth in ('DELHI','LONDON') and country_of_birth='FR';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
         0      21                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1     11      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1     11                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY     21    231      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("COUNTRY_OF_BIRTH"='FR' AND ("CITY_OF_BIRTH"='DELHI' OR                                                              
              "CITY_OF_BIRTH"='LONDON'))                                                                                           

REM  With two countries, the lack of knowledge makes the cardinality estimate worse !

select count(*), round(1536*(1/12)*(1/12)*(1/12)*(1/12)*(10/12)) Expected_Cardinality from test_cardinality where date_of_birth='01-JAN-60' and month_of_birth='JANUARY' and month_number = 1 and city_of_birth='BOMBAY' and country_of_birth='IN';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
       128       0                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1     28      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1     28                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY      1     28      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("MONTH_OF_BIRTH"='JANUARY' AND "MONTH_NUMBER"=1 AND                                                                  
              "CITY_OF_BIRTH"='BOMBAY' AND "COUNTRY_OF_BIRTH"='IN' AND                                                             
              "DATE_OF_BIRTH"='01-JAN-60')                                                                                         

REM  Perfect knowledge about individual columns when multiplied together returns absolutely imperfect knowledge of all the columns !

select count(*), round(1536*(0/12)) Expected_Cardinality from test_cardinality where country_of_birth='SG';

  COUNT(*)  ExpCrd                                                                                                                 
---------- -------                                                                                                                 
         0       0                                                                                                                 

Execution Plan
----------------------------------------------------------                                                                         
Plan hash value: 3936405985                                                                                                        
                                                                                                                                   
---------------------------------------------------------------------------------------                                            
 Id   Operation           Name              Rows   Bytes  Cost (%CPU) Time                                                 
---------------------------------------------------------------------------------------                                            
   0  SELECT STATEMENT                          1      3      6   (0) 00:00:01                                             
   1   SORT AGGREGATE                           1      3                                                                   
*  2    TABLE ACCESS FULL TEST_CARDINALITY     64    192      6   (0) 00:00:01                                             
---------------------------------------------------------------------------------------                                            
                                                                                                                                   
Predicate Information (identified by operation id):                                                                                
---------------------------------------------------                                                                                
                                                                                                                                   
   2 - filter("COUNTRY_OF_BIRTH"='SG')                                                                                             

REM  Although the Histogram on country_of_birth shows that there are *NO* rows for 'SG', Oracle cannot assume that there really are no rows.
REM  What if I had inserted an 'SG' row after the gather_table_stats ?
REM  So the optimizer must compute a Cardinality, how does it come up with 64 ? Any guesses ?

set autotrace off
Thus, you can see, in the simplest of cases, that the Cardinality
estimates can be very wrong. Imagine that this table is scaled up to 1,000 times its size, but the selectivity remains, generally, the same. Then, take this table in a complex multi-table query with join conditions. The incorrect Cardinality estimate could very well result in a poor Execution Plan. Thus, for cases like the "where city_of_birth in ('DELHI','LONDON') and country_of_birth='FR'", although we know that there would be 0 (zero) rows (irrespective of how large the table is), the Optimizer might well expect very many rows, depending on the table size. Where we would prefer that Oracle first filter on this table in a multi-table query, the Optimizer might prefer to execute this filter much later in the query, after needlessly wading through many blocks and rows in other tables in the same query!
2 comments:

Ravishankar said...: Hi Hemant,
Thanks for the post.
This is now addressed in 10.2.0.4 where we can use extended statistics.I did not try this 10.2.0.4 .11gR1 has this fixed. http://www.oracle.com/technology/obe/11gr1_db/perform/multistats/multicolstats.htm; 30 June, 2008
Hemant K Chitale said...: Yes, I haven't tried this in 10.2.0.4 yet.
(although the results I published are in 10.2.0.4 , without defining a column group and extended statistics).; 30 June, 2008
Hemant's Oracle DBA Blog

28 June, 2008

Cardinality Estimates : Dependent Columns

2 comments:

Aggregated by orafaq.com

OCP 12c