Hemant's Oracle DBA Blog: SIZE specification for Column Histograms

21 November, 2009

SIZE specification for Column Histograms

When you use the method_opt parameter in DBMS_STATS.GATHER_TABLE_STATS specifying 'FOR [ALL] COLUMNS [column names] SIZE n' remember to lookup the syntax again. I made the mistake of not verifying the syntax for specifying SIZE when I name columns in the FOR ... SIZE clause.

Simply put, if you specify FOR ALL COLUMNS, the SIZE specification of 'n' does apply to every column. But if you specify individual column names and the SIZE is specified last, Oracle applies the SIZE specification only to the last named column while using it's own limit of 75 buckets where you do not specify a value for SIZE (and haven't specified SIZE AUTO). This means that if you have more than 75 (but less than 250) distinct values, Oracle creates a Height Balanced Histogram instead of the Frequency Histogram that you had desired. (Of course, if you have less than 75 distinct values, Oracle does create the right Frequency Histogram with the corresponding number of buckets to capture each distinct value).

Thus, in this demo, the first GATHER_TABLE_STATS run creates the proper histograms for all 5 columns because I specify "FOR ALL COLUMNS SIZE 250".
However, in the second run, when I name each individual column but do not attach a SIZE specification, Oracle trims columns C2_100D_NUMBERS and C4_100D_VARCHARS down to Height Balanced Histograms with 75 buckets only.
In the third run, I attache a SIZE specification to each column name and achieve the desired Histograms.


SQL> col column_name format a30
SQL>
SQL> Drop Table TEST_SIZE_SPEC purge;

Table dropped.

SQL>
SQL> Create Table TEST_SIZE_SPEC (
2    c1_10d_numbers    number not null,
3    c2_100d_numbers    number not null,
4    c3_10d_varchars    varchar2(5) not null,
5    c4_100d_varchars    varchar2(5) not null,
6    c5_dummy    number not null)
7  /

Table created.

SQL>
SQL> declare
2  i  number;
3  begin
4  for i in 1..10000
5  loop
6    insert into TEST_SIZE_SPEC values (
7      mod(i,10),  mod(i,100), 'C3_'||mod(i,10),'C4_'||mod(i,100), i
8                                      );
9  end loop;
10  end;
11  /

PL/SQL procedure successfully completed.

SQL>
SQL> select count(distinct(c1_10d_numbers)) from TEST_SIZE_SPEC;

COUNT(DISTINCT(C1_10D_NUMBERS))
-------------------------------
                           10

SQL> select count(distinct(c2_100d_numbers)) from TEST_SIZE_SPEC;

COUNT(DISTINCT(C2_100D_NUMBERS))
--------------------------------
                           100

SQL> select count(distinct(c3_10d_varchars)) from TEST_SIZE_SPEC;

COUNT(DISTINCT(C3_10D_VARCHARS))
--------------------------------
                            10

SQL> select count(distinct(c4_100d_varchars)) from TEST_SIZE_SPEC;

COUNT(DISTINCT(C4_100D_VARCHARS))
---------------------------------
                            100

SQL> select count(distinct(c5_dummy)) from TEST_SIZE_SPEC;

COUNT(DISTINCT(C5_DUMMY))
-------------------------
                  10000

SQL>
SQL>
SQL> REM =========================================================================================================
SQL> REM =========================================================================================================
SQL> -- Gather stats with 100% sampling but SIZE 250
SQL> exec dbms_stats.gather_table_stats('','TEST_SIZE_SPEC',estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE 250');

PL/SQL procedure successfully completed.

SQL>
SQL> -- Review column statistics, histograms
SQL> select column_name, sample_size, num_distinct, histogram, num_buckets
2  from user_tab_columns
3  where table_name = 'TEST_SIZE_SPEC'
4  order by column_name
5  /

COLUMN_NAME                    SAMPLE_SIZE NUM_DISTINCT HISTOGRAM       NUM_BUCKETS
------------------------------ ----------- ------------ --------------- -----------
C1_10D_NUMBERS                       10000           10 FREQUENCY                10
C2_100D_NUMBERS                      10000          100 FREQUENCY               100
C3_10D_VARCHARS                      10000           10 FREQUENCY                10
C4_100D_VARCHARS                     10000          100 FREQUENCY               100
C5_DUMMY                             10000        10000 HEIGHT BALANCED         250

SQL>
SQL> select column_name, count(*)
2  from user_tab_histograms
3  where table_name = 'TEST_SIZE_SPEC'
4  group by column_name
5  order by column_name
6  /

COLUMN_NAME                      COUNT(*)
------------------------------ ----------
C1_10D_NUMBERS                         10
C2_100D_NUMBERS                       100
C3_10D_VARCHARS                        10
C4_100D_VARCHARS                      100
C5_DUMMY                              251

SQL>
SQL> REM =========================================================================================================
SQL> REM =========================================================================================================
SQL> -- Now, reGather statistics but specifying column names
SQL> exec dbms_stats.gather_table_stats('','TEST_SIZE_SPEC',estimate_percent=>100,-
> method_opt=>'FOR COLUMNS C1_10D_NUMBERS, C2_100D_NUMBERS, C3_10D_VARCHARS, C4_100D_VARCHARS, C5_DUMMY SIZE 250');

PL/SQL procedure successfully completed.

SQL>
SQL> -- Review column statistics, histograms
SQL> select column_name, sample_size, num_distinct, histogram, num_buckets
2  from user_tab_columns
3  where table_name = 'TEST_SIZE_SPEC'
4  order by column_name
5  /

COLUMN_NAME                    SAMPLE_SIZE NUM_DISTINCT HISTOGRAM       NUM_BUCKETS
------------------------------ ----------- ------------ --------------- -----------
C1_10D_NUMBERS                       10000           10 FREQUENCY                10
C2_100D_NUMBERS                      10000          100 HEIGHT BALANCED          75
C3_10D_VARCHARS                      10000           10 FREQUENCY                10
C4_100D_VARCHARS                     10000          100 HEIGHT BALANCED          75
C5_DUMMY                             10000        10000 HEIGHT BALANCED         250

SQL>
SQL> select column_name, count(*)
2  from user_tab_histograms
3  where table_name = 'TEST_SIZE_SPEC'
4  group by column_name
5  order by column_name
6  /

COLUMN_NAME                      COUNT(*)
------------------------------ ----------
C1_10D_NUMBERS                         10
C2_100D_NUMBERS                        76
C3_10D_VARCHARS                        10
C4_100D_VARCHARS                       76
C5_DUMMY                              251

SQL>
SQL>
SQL> REM =========================================================================================================
SQL> REM =========================================================================================================
SQL> -- Now, reGather statistics but specifying column names and size each
SQL> exec dbms_stats.gather_table_stats('','TEST_SIZE_SPEC',estimate_percent=>100,-
> method_opt=>'FOR COLUMNS C1_10D_NUMBERS SIZE 250, C2_100D_NUMBERS SIZE 250, C3_10D_VARCHARS SIZE 250, C4_100D_VARCHARS SIZE 250, C5_DUMMY SIZE 250');

PL/SQL procedure successfully completed.

SQL>
SQL> -- Review column statistics, histograms
SQL> select column_name, sample_size, num_distinct, histogram, num_buckets
2  from user_tab_columns
3  where table_name = 'TEST_SIZE_SPEC'
4  order by column_name
5  /

COLUMN_NAME                    SAMPLE_SIZE NUM_DISTINCT HISTOGRAM       NUM_BUCKETS
------------------------------ ----------- ------------ --------------- -----------
C1_10D_NUMBERS                       10000           10 FREQUENCY                10
C2_100D_NUMBERS                      10000          100 FREQUENCY               100
C3_10D_VARCHARS                      10000           10 FREQUENCY                10
C4_100D_VARCHARS                     10000          100 FREQUENCY               100
C5_DUMMY                             10000        10000 HEIGHT BALANCED         250

SQL>
SQL> select column_name, count(*)
2  from user_tab_histograms
3  where table_name = 'TEST_SIZE_SPEC'
4  group by column_name
5  order by column_name
6  /

COLUMN_NAME                      COUNT(*)
------------------------------ ----------
C1_10D_NUMBERS                         10
C2_100D_NUMBERS                       100
C3_10D_VARCHARS                        10
C4_100D_VARCHARS                      100
C5_DUMMY                              251

SQL>

Therefore, be careful when you name individual columns in the method_opt parameter 'FOR COLUMNS ...'. Ensure that you specify the desired SIZE for each column separately (even if the SIZE is to be the same -- i.e. 250 in my case).

UPDATE : As pointed out by Randolf, an alternative command is

"FOR COLUMNS SIZE 250 C1_10D_NUMBERS, C2_100D_NUMBERS, C3_10D_VARCHARS, C4_100D_VARCHARS, C5_DUMMY"

so that the SIZE n specification does not have to be repeated after every column name.
.
.
.

6 comments:

Randolf said...: Hemant,

instead of explicitly mentioning the SIZE for every column, there is an alternative of the syntax that might be useful in this case:

FOR COLUMNS SIZE 250 C1_10D_NUMBERS, C2_100D_NUMBERS, C3_10D_VARCHARS, C4_100D_VARCHARS, C5_DUMMY

So you can effectively apply different bucket sizes to different lists of columns.

Regards,
Randolf

Oracle related stuff blog:
http://oracle-randolf.blogspot.com/

Co-author of the upcoming "OakTable Expert Oracle Practices" book:
http://www.apress.com/book/view/1430226684
http://www.amazon.de/Expert-Oracle-Practices-Database-Administration/dp/1430226684; 22 November, 2009
Hemant K Chitale said...: Randolf,
Yes, that is a good alternative as well -- to provide the SIZE n value upfront.
Thanks.
Hemant; 22 November, 2009
Home Audio Accessories said...: thanks for the syntax, it help me alot.; 23 November, 2009
Anonymous said...: From Wolfgang Breitling's paper:
"Statistics Gathering and Histograms DOs ✔ and DON’Ts"

He wrote:
Do use “for all columns size 1”
then
✔ “for columns size 254 col1[,col2,...]”
✘ not “for columns col1[,col2,...] size 254” *
for columns requiring histograms.

and

How To Quickly Add/Remove Column Statistics (Histograms) For A Column [ID 390249.1]

•gathering statistics for multiple columns with different bucket sizes :SQL> exec dbms_stats.gather_table_stats(null, '', cascade=>false, method_opt => 'for columns size N , for columns size M ,');
•gathering statistics for multiple columns with different bucket sizes plus minimal information for the remaining columns:SQL> exec dbms_stats.gather_table_stats(null, '', cascade=>false, method_opt => 'for all columns size 1 for columns size N , for columns size M ,');; 19 February, 2013
ganez said...: Hi...

Could you tell me what is the difference between FOR COLUMNS SIZE 250 and FOR COLUMNS SIZE 250 C1_10D_NUMBERS, C2_100D_NUMBERS, C3_10D_VARCHARS, C4_100D_VARCHARS, C5_DUMMY?

I meant instead of giving size in each columns, Shall we give simply like FOR COLUMNS SIZE 250. Am i right? please clarify me...

Thanks a lot...ganesh; 10 June, 2015
Hemant K Chitale said...: Ganesh,

When not using FOR ALL COLUMNS, you could specify individual columns (e.g. 3 of the 5 columns here) with the FOR COLUMNS clause.; 10 June, 2015

Hemant's Oracle DBA Blog

21 November, 2009

SIZE specification for Column Histograms

6 comments:

Aggregated by orafaq.com

OCP 12c