Search My Oracle Blog

Custom Search

13 November, 2009

Sample Sizes : Table level and Column level

There appears to be some confusion about Sample Size that the "Auto" behaviour in DBMS_STATS determines.

The AUTO_SAMPLE_SIZE for the "ESTIMATE_PERCENT" parameter doesn't mean the size as "SIZE AUTO" for the "METHOD_OPT" parameter.
The former is for Table level statistics.
The latter is for Column Histograms. NULLs are not included in Histograms.

Here is a demonstration :

SQL> drop table Test_Stats_Size purge;

Table dropped.

SQL>
SQL> create table Test_Stats_Size
2 as select object_id col_1, owner col_2, object_name col_3, created col_4
3 from dba_objects
4 where 1=2
5 /

Table created.

SQL> alter table test_stats_size add (padding_col varchar2(80));

Table altered.

SQL> alter table test_stats_size nologging;

Table altered.

SQL>
SQL> insert /*+ APPEND */ into test_stats_size
2 select mod(object_id,50), owner, object_name, created, dbms_random.string('X',60) from dba_objects
3 union all
4 select mod(object_id,50), owner, object_name, created, dbms_random.string('X',60) from dba_objects
5 union all
6 select mod(object_id,50), owner, object_name, created, dbms_random.string('X',60) from dba_objects
7 union all
8 select mod(object_id,50), owner, object_name, created, dbms_random.string('X',60) from dba_objects
9 union all
10 select mod(object_id,50), owner, object_name, created, dbms_random.string('X',60) from dba_objects
11 union all
12 select mod(object_id,50), owner, object_name, created, dbms_random.string('X',60) from dba_objects
13 union all
14 select mod(object_id,50), owner, object_name, created, dbms_random.string('X',60) from dba_objects
15 /

354501 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> select count(*) from test_stats_size ;

COUNT(*)
----------
354501

SQL> select count(*) from test_stats_size where col_2 = 'HEMANT';

COUNT(*)
----------
245

SQL>
SQL>
SQL> -- Gather Stats with 100% sampling but Auto Size for columns
SQL> exec dbms_stats.gather_table_stats('','TEST_STATS_SIZE',estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE AUTO');

PL/SQL procedure successfully completed.

SQL>
SQL> select table_name, num_rows, sample_size, sample_size*100/num_rows Sample_Perc
2 from user_tables
3 where table_name = 'TEST_STATS_SIZE'
4 /

TABLE_NAME NUM_ROWS SAMPLE_SIZE SAMPLE_PERC
------------------------------ ---------- ----------- -----------
TEST_STATS_SIZE 354501 354501 100

SQL> REM We see above that Table Level Sampling is 100%
SQL>
SQL> select column_name, num_distinct, sample_size, histogram , num_buckets
2 from user_tab_columns
3 where table_name = 'TEST_STATS_SIZE'
4 order by column_name
5 /

COLUMN_NAME NUM_DISTINCT SAMPLE_SIZE HISTOGRAM NUM_BUCKETS
------------------------------ ------------ ----------- --------------- -----------
COL_1 50 354494 NONE 1
COL_2 28 354501 FREQUENCY 28
COL_3 30311 354501 NONE 1
COL_4 1411 354501 NONE 1
PADDING_COL 354501 354501 NONE 1

SQL>
SQL> REM Was Column Level Sampling 100% ?
SQL> REM What type of Histograms and how many buckets did Oracle create ?
SQL> REM We see above that not necessarily all columns have the same number of rows sampled
SQL> REM Also, not necessarily all columns have a Histogram !
SQL>
SQL> REM Let's see what happens after a couple of queries on col_1
SQL>
SQL> select count(*) from test_stats_size where col_1 is null;

COUNT(*)
----------
7

SQL> select count(distinct(col_1)) from test_stats_size;

COUNT(DISTINCT(COL_1))
----------------------
50

SQL> select count(*) from test_stats_size where col_1 between 10 and 25;

COUNT(*)
----------
113092

SQL>
SQL> -- Gather Stats with 100% sampling but Auto Size for columns
SQL> exec dbms_stats.gather_table_stats('','TEST_STATS_SIZE',estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE AUTO');

PL/SQL procedure successfully completed.

SQL>
SQL> select table_name, num_rows, sample_size, sample_size*100/num_rows Sample_Perc
2 from user_tables
3 where table_name = 'TEST_STATS_SIZE'
4 /

TABLE_NAME NUM_ROWS SAMPLE_SIZE SAMPLE_PERC
------------------------------ ---------- ----------- -----------
TEST_STATS_SIZE 354501 354501 100

SQL> REM We see above that Table Level Sampling is 100%
SQL>
SQL> select column_name, num_distinct, sample_size, histogram , num_buckets
2 from user_tab_columns
3 where table_name = 'TEST_STATS_SIZE'
4 order by column_name
5 /

COLUMN_NAME NUM_DISTINCT SAMPLE_SIZE HISTOGRAM NUM_BUCKETS
------------------------------ ------------ ----------- --------------- -----------
COL_1 50 354494 FREQUENCY 50
COL_2 28 354501 FREQUENCY 28
COL_3 30311 354501 NONE 1
COL_4 1411 354501 NONE 1
PADDING_COL 354501 354501 NONE 1

SQL>
SQL> REM However, Column Level Sampling and Histogram collection for col_1 is now different !
SQL> REM Also notice that the 7 NULLs in col_1 are excluded !
SQL>
SQL>

And why did Oracle now gather statistics on col_1 ? See my earlier postings on COL_USAGE influencing SIZE AUTO.
.
.
.

3 comments:

Santosh Kumar said...

Nice explanation. Thanks Hemant.

Anonymous said...

Hemant, thanks a lot for the knowledge, i did know that column level stats are done based on what queries are already run for that column.

Thanks
Deen

Hemant K Chitale said...

Anonymous,

"column level stats are done based on what queries are already run for that column" yes, when using "SIZE AUTO" (which is the default if method_opt is not specified).

Hemant K CHitale

Aggregated by orafaq.com

Aggregated by orafaq.com
This blog is being aggregated by orafaq.com

Top 50 Oracle SQL Blogs 2016

Top 50 Oracle SQL Blogs 2016
Top 50 Oracle SQL Blogs 2016