Hemant's Oracle DBA Blog: November 2009

23 November, 2009

MOS Survey Results

Daniel Fink has published the results of a survey on MOS -- aka "My Oracle Support".

21 November, 2009

SIZE specification for Column Histograms

When you use the method_opt parameter in DBMS_STATS.GATHER_TABLE_STATS specifying 'FOR [ALL] COLUMNS [column names] SIZE n' remember to lookup the syntax again. I made the mistake of not verifying the syntax for specifying SIZE when I name columns in the FOR ... SIZE clause.

Simply put, if you specify FOR ALL COLUMNS, the SIZE specification of 'n' does apply to every column. But if you specify individual column names and the SIZE is specified last, Oracle applies the SIZE specification only to the last named column while using it's own limit of 75 buckets where you do not specify a value for SIZE (and haven't specified SIZE AUTO). This means that if you have more than 75 (but less than 250) distinct values, Oracle creates a Height Balanced Histogram instead of the Frequency Histogram that you had desired. (Of course, if you have less than 75 distinct values, Oracle does create the right Frequency Histogram with the corresponding number of buckets to capture each distinct value).

Thus, in this demo, the first GATHER_TABLE_STATS run creates the proper histograms for all 5 columns because I specify "FOR ALL COLUMNS SIZE 250".
However, in the second run, when I name each individual column but do not attach a SIZE specification, Oracle trims columns C2_100D_NUMBERS and C4_100D_VARCHARS down to Height Balanced Histograms with 75 buckets only.
In the third run, I attache a SIZE specification to each column name and achieve the desired Histograms.


SQL> col column_name format a30
SQL>
SQL> Drop Table TEST_SIZE_SPEC purge;

Table dropped.

SQL>
SQL> Create Table TEST_SIZE_SPEC (
2    c1_10d_numbers    number not null,
3    c2_100d_numbers    number not null,
4    c3_10d_varchars    varchar2(5) not null,
5    c4_100d_varchars    varchar2(5) not null,
6    c5_dummy    number not null)
7  /

Table created.

SQL>
SQL> declare
2  i  number;
3  begin
4  for i in 1..10000
5  loop
6    insert into TEST_SIZE_SPEC values (
7      mod(i,10),  mod(i,100), 'C3_'||mod(i,10),'C4_'||mod(i,100), i
8                                      );
9  end loop;
10  end;
11  /

PL/SQL procedure successfully completed.

SQL>
SQL> select count(distinct(c1_10d_numbers)) from TEST_SIZE_SPEC;

COUNT(DISTINCT(C1_10D_NUMBERS))
-------------------------------
                           10

SQL> select count(distinct(c2_100d_numbers)) from TEST_SIZE_SPEC;

COUNT(DISTINCT(C2_100D_NUMBERS))
--------------------------------
                           100

SQL> select count(distinct(c3_10d_varchars)) from TEST_SIZE_SPEC;

COUNT(DISTINCT(C3_10D_VARCHARS))
--------------------------------
                            10

SQL> select count(distinct(c4_100d_varchars)) from TEST_SIZE_SPEC;

COUNT(DISTINCT(C4_100D_VARCHARS))
---------------------------------
                            100

SQL> select count(distinct(c5_dummy)) from TEST_SIZE_SPEC;

COUNT(DISTINCT(C5_DUMMY))
-------------------------
                  10000

SQL>
SQL>
SQL> REM =========================================================================================================
SQL> REM =========================================================================================================
SQL> -- Gather stats with 100% sampling but SIZE 250
SQL> exec dbms_stats.gather_table_stats('','TEST_SIZE_SPEC',estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE 250');

PL/SQL procedure successfully completed.

SQL>
SQL> -- Review column statistics, histograms
SQL> select column_name, sample_size, num_distinct, histogram, num_buckets
2  from user_tab_columns
3  where table_name = 'TEST_SIZE_SPEC'
4  order by column_name
5  /

COLUMN_NAME                    SAMPLE_SIZE NUM_DISTINCT HISTOGRAM       NUM_BUCKETS
------------------------------ ----------- ------------ --------------- -----------
C1_10D_NUMBERS                       10000           10 FREQUENCY                10
C2_100D_NUMBERS                      10000          100 FREQUENCY               100
C3_10D_VARCHARS                      10000           10 FREQUENCY                10
C4_100D_VARCHARS                     10000          100 FREQUENCY               100
C5_DUMMY                             10000        10000 HEIGHT BALANCED         250

SQL>
SQL> select column_name, count(*)
2  from user_tab_histograms
3  where table_name = 'TEST_SIZE_SPEC'
4  group by column_name
5  order by column_name
6  /

COLUMN_NAME                      COUNT(*)
------------------------------ ----------
C1_10D_NUMBERS                         10
C2_100D_NUMBERS                       100
C3_10D_VARCHARS                        10
C4_100D_VARCHARS                      100
C5_DUMMY                              251

SQL>
SQL> REM =========================================================================================================
SQL> REM =========================================================================================================
SQL> -- Now, reGather statistics but specifying column names
SQL> exec dbms_stats.gather_table_stats('','TEST_SIZE_SPEC',estimate_percent=>100,-
> method_opt=>'FOR COLUMNS C1_10D_NUMBERS, C2_100D_NUMBERS, C3_10D_VARCHARS, C4_100D_VARCHARS, C5_DUMMY SIZE 250');

PL/SQL procedure successfully completed.

SQL>
SQL> -- Review column statistics, histograms
SQL> select column_name, sample_size, num_distinct, histogram, num_buckets
2  from user_tab_columns
3  where table_name = 'TEST_SIZE_SPEC'
4  order by column_name
5  /

COLUMN_NAME                    SAMPLE_SIZE NUM_DISTINCT HISTOGRAM       NUM_BUCKETS
------------------------------ ----------- ------------ --------------- -----------
C1_10D_NUMBERS                       10000           10 FREQUENCY                10
C2_100D_NUMBERS                      10000          100 HEIGHT BALANCED          75
C3_10D_VARCHARS                      10000           10 FREQUENCY                10
C4_100D_VARCHARS                     10000          100 HEIGHT BALANCED          75
C5_DUMMY                             10000        10000 HEIGHT BALANCED         250

SQL>
SQL> select column_name, count(*)
2  from user_tab_histograms
3  where table_name = 'TEST_SIZE_SPEC'
4  group by column_name
5  order by column_name
6  /

COLUMN_NAME                      COUNT(*)
------------------------------ ----------
C1_10D_NUMBERS                         10
C2_100D_NUMBERS                        76
C3_10D_VARCHARS                        10
C4_100D_VARCHARS                       76
C5_DUMMY                              251

SQL>
SQL>
SQL> REM =========================================================================================================
SQL> REM =========================================================================================================
SQL> -- Now, reGather statistics but specifying column names and size each
SQL> exec dbms_stats.gather_table_stats('','TEST_SIZE_SPEC',estimate_percent=>100,-
> method_opt=>'FOR COLUMNS C1_10D_NUMBERS SIZE 250, C2_100D_NUMBERS SIZE 250, C3_10D_VARCHARS SIZE 250, C4_100D_VARCHARS SIZE 250, C5_DUMMY SIZE 250');

PL/SQL procedure successfully completed.

SQL>
SQL> -- Review column statistics, histograms
SQL> select column_name, sample_size, num_distinct, histogram, num_buckets
2  from user_tab_columns
3  where table_name = 'TEST_SIZE_SPEC'
4  order by column_name
5  /

COLUMN_NAME                    SAMPLE_SIZE NUM_DISTINCT HISTOGRAM       NUM_BUCKETS
------------------------------ ----------- ------------ --------------- -----------
C1_10D_NUMBERS                       10000           10 FREQUENCY                10
C2_100D_NUMBERS                      10000          100 FREQUENCY               100
C3_10D_VARCHARS                      10000           10 FREQUENCY                10
C4_100D_VARCHARS                     10000          100 FREQUENCY               100
C5_DUMMY                             10000        10000 HEIGHT BALANCED         250

SQL>
SQL> select column_name, count(*)
2  from user_tab_histograms
3  where table_name = 'TEST_SIZE_SPEC'
4  group by column_name
5  order by column_name
6  /

COLUMN_NAME                      COUNT(*)
------------------------------ ----------
C1_10D_NUMBERS                         10
C2_100D_NUMBERS                       100
C3_10D_VARCHARS                        10
C4_100D_VARCHARS                      100
C5_DUMMY                              251

SQL>

Therefore, be careful when you name individual columns in the method_opt parameter 'FOR COLUMNS ...'. Ensure that you specify the desired SIZE for each column separately (even if the SIZE is to be the same -- i.e. 250 in my case).

UPDATE : As pointed out by Randolf, an alternative command is

"FOR COLUMNS SIZE 250 C1_10D_NUMBERS, C2_100D_NUMBERS, C3_10D_VARCHARS, C4_100D_VARCHARS, C5_DUMMY"

so that the SIZE n specification does not have to be repeated after every column name.
.
.
.

13 November, 2009

Sample Sizes : Table level and Column level

There appears to be some confusion about Sample Size that the "Auto" behaviour in DBMS_STATS determines.

The AUTO_SAMPLE_SIZE for the "ESTIMATE_PERCENT" parameter doesn't mean the size as "SIZE AUTO" for the "METHOD_OPT" parameter.
The former is for Table level statistics.
The latter is for Column Histograms. NULLs are not included in Histograms.

Here is a demonstration :


SQL> drop table Test_Stats_Size purge;

Table dropped.

SQL>
SQL> create table Test_Stats_Size
 2  as select object_id col_1, owner  col_2, object_name col_3, created col_4
 3  from dba_objects
 4  where 1=2
 5  /

Table created.

SQL> alter table test_stats_size add (padding_col varchar2(80));

Table altered.

SQL> alter table test_stats_size nologging;

Table altered.

SQL>
SQL> insert /*+ APPEND */ into test_stats_size
 2  select mod(object_id,50), owner, object_name, created, dbms_random.string('X',60) from dba_objects
 3  union all
 4  select mod(object_id,50), owner, object_name, created, dbms_random.string('X',60) from dba_objects
 5  union all
 6  select mod(object_id,50), owner, object_name, created, dbms_random.string('X',60) from dba_objects
 7  union all
 8  select mod(object_id,50), owner, object_name, created, dbms_random.string('X',60) from dba_objects
 9  union all
10  select mod(object_id,50), owner, object_name, created, dbms_random.string('X',60) from dba_objects
11  union all
12  select mod(object_id,50), owner, object_name, created, dbms_random.string('X',60) from dba_objects
13  union all
14  select mod(object_id,50), owner, object_name, created, dbms_random.string('X',60) from dba_objects
15  /

354501 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> select count(*) from test_stats_size ;

 COUNT(*)
----------
   354501

SQL> select count(*) from test_stats_size where col_2 = 'HEMANT';

 COUNT(*)
----------
      245

SQL>
SQL>
SQL> -- Gather Stats with 100%  sampling but Auto Size for columns
SQL> exec dbms_stats.gather_table_stats('','TEST_STATS_SIZE',estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE AUTO');

PL/SQL procedure successfully completed.

SQL>
SQL> select table_name, num_rows, sample_size, sample_size*100/num_rows Sample_Perc
 2  from user_tables
 3  where table_name = 'TEST_STATS_SIZE'
 4  /

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE SAMPLE_PERC
------------------------------ ---------- ----------- -----------
TEST_STATS_SIZE                    354501      354501         100

SQL> REM We see above that Table Level Sampling is 100%
SQL>
SQL> select column_name, num_distinct, sample_size, histogram , num_buckets
 2  from user_tab_columns
 3  where table_name = 'TEST_STATS_SIZE'
 4  order by column_name
 5  /

COLUMN_NAME                    NUM_DISTINCT SAMPLE_SIZE HISTOGRAM       NUM_BUCKETS
------------------------------ ------------ ----------- --------------- -----------
COL_1                                    50      354494 NONE                      1
COL_2                                    28      354501 FREQUENCY                28
COL_3                                 30311      354501 NONE                      1
COL_4                                  1411      354501 NONE                      1
PADDING_COL                          354501      354501 NONE                      1

SQL>
SQL> REM Was Column Level Sampling 100% ?
SQL> REM What type of Histograms and how many buckets did Oracle create ?
SQL> REM  We see above that not necessarily all columns have the same number of rows sampled
SQL> REM  Also, not necessarily all columns have a Histogram !
SQL>
SQL> REM Let's see what happens after a couple of queries on col_1
SQL>
SQL> select count(*) from test_stats_size where col_1 is null;

 COUNT(*)
----------
        7

SQL> select count(distinct(col_1)) from test_stats_size;

COUNT(DISTINCT(COL_1))
----------------------
                   50

SQL> select count(*) from test_stats_size where col_1 between 10 and 25;

 COUNT(*)
----------
   113092

SQL>
SQL> -- Gather Stats with 100%  sampling but Auto Size for columns
SQL> exec dbms_stats.gather_table_stats('','TEST_STATS_SIZE',estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE AUTO');

PL/SQL procedure successfully completed.

SQL>
SQL> select table_name, num_rows, sample_size, sample_size*100/num_rows Sample_Perc
 2  from user_tables
 3  where table_name = 'TEST_STATS_SIZE'
 4  /

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE SAMPLE_PERC
------------------------------ ---------- ----------- -----------
TEST_STATS_SIZE                    354501      354501         100

SQL> REM We see above that Table Level Sampling is 100%
SQL>
SQL> select column_name, num_distinct, sample_size, histogram , num_buckets
 2  from user_tab_columns
 3  where table_name = 'TEST_STATS_SIZE'
 4  order by column_name
 5  /

COLUMN_NAME                    NUM_DISTINCT SAMPLE_SIZE HISTOGRAM       NUM_BUCKETS
------------------------------ ------------ ----------- --------------- -----------
COL_1                                    50      354494 FREQUENCY                50
COL_2                                    28      354501 FREQUENCY                28
COL_3                                 30311      354501 NONE                      1
COL_4                                  1411      354501 NONE                      1
PADDING_COL                          354501      354501 NONE                      1

SQL>
SQL> REM  However, Column Level Sampling and Histogram collection for col_1 is now different !
SQL> REM  Also notice that the 7 NULLs in col_1 are excluded !
SQL>
SQL>

And why did Oracle now gather statistics on col_1 ? See my earlier postings on COL_USAGE influencing SIZE AUTO.
.
.
.

Hemant's Oracle DBA Blog