Search My Oracle Blog

Custom Search

02 August, 2008

Testing Gather Stats behaviour based on COL_USAGE

Here I am posting some results on tests with COL_USAGE$ and GATHER_STATS behaviours of 'FOR ALL COLUMNS SIZE AUTOSKEWONLY250' with different ESTIMATE_PERCENTs.

I've put in some, what I hope are, explanatory notes in the output.

These tests were on 10.2.0.4 where a few bugs are fixed. I had slightly different results in 10.2.0.2 (a different database so different rows counts). I plan to post the same test results on other 10.2.0.1/10.2.0.3 databases, when possible. The bugs relate to a higher degree of inaccuracy in colum histograms / histogram types. 10.2.0.4 seems to be more reliable, but not 100%



23:26:08 SQL> select table_name, num_rows, last_analyzed from user_tables;

TABLE_NAME NUM_ROWS LAST_ANALYZED
------------------------------ ---------- ---------------
SOURCE_TABLE

23:26:19 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

no rows selected

23:26:38 SQL> create table test_data (owner_col varchar2(30), name_col varchar2(128), id_col number, type_col varchar2(19), created_col date);

Table created.

23:28:06 SQL> insert into test_data select owner, object_name, object_id, object_type, created from dba_objects;

50605 rows created.

23:28:28 SQL> commit;

Commit complete.

23:28:33 SQL> select object_id, data_object_id from user_objects where object_name ='TEST_DATA';

OBJECT_ID DATA_OBJECT_ID
---------- --------------
53392 53392

23:28:57 SQL> insert into test_data select * from test_data union all select * from test_data;

101210 rows created.

23:29:20 SQL> commit;

Commit complete.

23:29:24 SQL> select count(*) from test_data;

COUNT(*)
----------
151815

23:29:47 SQL> select count(*) from test_data where owner_col = 'HEMANT';

COUNT(*)
----------
9

23:30:09 SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

23:30:26 SQL> select count(*) from test_data where owner_col = 'HEMANT';

COUNT(*)
----------
9

23:30:50 SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

23:31:26 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------
53392 1 1 0 0 0 0 0 01-AUG 23:30:26

23:31:43 SQL> select object_name, object_id from user_objects;

OBJECT_NAME
--------------------------------------------------------------------------------------------------------------------------------
OBJECT_ID
----------
SOURCE_TABLE
53391

TEST_DATA
53392

LEV_SEQ
53322


23:32:00 SQL> REM So, for OBJ 53392 (table TEST_DATA), COL_USAGE$ show that Col#1 (OWNER_COL has had 1 query on Equality Predicate)
23:38:44 SQL>
23:38:44 SQL> select count(*) from test_data where id_col > 1000;

COUNT(*)
----------
148953

23:39:04 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------
53392 1 1 0 0 0 0 0 01-AUG 23:30:26

REM The query on iD_COL (which is INTCOL#3 in the table) isn't reflected in COL_USAGE$ yet.
REM We'll flush database monitoring to ensure that the data is reflected
REM note : A Gather_Statistics call would be expected to flush monitoring as well

23:39:13 SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

23:39:23 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------
53392 1 1 0 0 0 0 0 01-AUG 23:30:26
53392 3 0 0 0 1 0 0 01-AUG 23:39:23

23:39:31 SQL> REM The entry for Col# (ID_COL) now shows that a RangeScan was executed
23:40:09 SQL>
23:40:09 SQL> exec dbms_stats.gather_schema_stats(user);

PL/SQL procedure successfully completed.

23:40:20 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
------------------------------ ---------- ----------- ---------------
SOURCE_TABLE 303108 60119 01-AUG 23:40:20
TEST_DATA 152116 38029 01-AUG 23:40:20

23:40:36 SQL> REM Oracle used an Auto Size to sample Table statistics.
23:40:58 SQL> REM what about columns ?

23:41:07 SQL> select column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:41:28 2 from user_tab_col_statistics
23:41:32 3 where table_name = 'TEST_DATA';

COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
OWNER_COL 22 3.3324E-06 22 FREQUENCY 5832 01-AUG 23:40:20
NAME_COL 28387 .000035227 1 NONE 38029 01-AUG 23:40:20
ID_COL 50487 .000019807 1 NONE 38029 01-AUG 23:40:20
TYPE_COL 31 .032258065 1 NONE 5832 01-AUG 23:40:20
CREATED_COL 935 .001069519 1 NONE 5832 01-AUG 23:40:20

23:41:49 SQL> REM Oracle gathered a Frequency Histogram on OWNER_COL.

23:42:44 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100);

PL/SQL procedure successfully completed.

23:43:06 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
------------------------------ ---------- ----------- ---------------
SOURCE_TABLE 303666 303666 01-AUG 23:43:04
TEST_DATA 151815 151815 01-AUG 23:43:06

23:43:14 SQL> select column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:43:22 2 from user_tab_col_statistics
23:43:28 3 where table_name = 'TEST_DATA';

COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
OWNER_COL 26 3.2935E-06 26 FREQUENCY 151815 01-AUG 23:43:06
NAME_COL 30275 .000033031 1 NONE 151815 01-AUG 23:43:06
ID_COL 50605 .000019761 1 NONE 151815 01-AUG 23:43:06
TYPE_COL 41 .024390244 1 NONE 151815 01-AUG 23:43:06
CREATED_COL 1380 .000724638 1 NONE 151815 01-AUG 23:43:06

23:43:33 SQL> REM Sampling is 100% but no new histograms are gathered
23:44:24 SQL>
23:44:44 SQL> select count(*) from test_data where type_col like 'P%';

COUNT(*)
----------
5268

23:44:53 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------
53392 1 1 0 0 0 0 0 01-AUG 23:30:26
53392 3 0 0 0 1 0 0 01-AUG 23:39:23

23:45:06 SQL> REM Col_Usage still shows queries on OWNER_COL and ID_COL
23:45:21 SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

23:45:31 SQL> select * from sys.col_usage$ where obj# in (select object_id from user_objects);

OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------
53392 1 1 0 0 0 0 0 01-AUG 23:30:26
53392 3 0 0 0 1 0 0 01-AUG 23:39:23
53392 4 0 0 0 0 1 0 01-AUG 23:45:31

23:45:38 SQL> REM Now we see the LIKE_PREDs having been used for Col#4 (TYPE_COL)
23:46:10 SQL> exec dbms_stats.gather_schema_stats(user);

PL/SQL procedure successfully completed.

23:46:27 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
------------------------------ ---------- ----------- ---------------
SOURCE_TABLE 303256 75814 01-AUG 23:46:27
TEST_DATA 152828 38207 01-AUG 23:46:27

23:46:40 SQL> select column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:47:10 2 from user_tab_col_statistics
23:47:23 3 where table_name = 'TEST_DATA';

COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
OWNER_COL 22 .000003283 22 FREQUENCY 6332 01-AUG 23:46:27
NAME_COL 28368 .000035251 1 NONE 38207 01-AUG 23:46:27
ID_COL 50662 .000019739 1 NONE 38207 01-AUG 23:46:27
TYPE_COL 30 .000003283 30 FREQUENCY 6332 01-AUG 23:46:27
CREATED_COL 950 .001052632 1 NONE 6332 01-AUG 23:46:27

23:47:36 SQL> REM Now, Table level statistics are updated and a new Histogram on TYPE_COL has been gathered
23:48:44 SQL> REM The Presence of TYPE_COL (INTCOL#=4) in SYS.COL_USAGE$ drove the 'FOR ALL COLUMN SIZE AUTO
23:49:16 SQL> REM to generate a histogram on this column
23:49:34 SQL> REM Earlier, there was no Histogram on this column
23:50:04 SQL> REM Therefore, the first one or few queries on this column would have been executed without the
23:50:23 SQL> REM advantage of Histogram !
23:50:37 SQL>
23:50:47 SQL> create table other_test_data as select * from test_data;

Table created.

23:51:01 SQL> exec dbms_Stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

23:51:11 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
------------------------------ ---------- ----------- ---------------
OTHER_TEST_DATA
SOURCE_TABLE 303256 75814 01-AUG 23:46:27
TEST_DATA 152828 38207 01-AUG 23:46:27

23:51:20 SQL> exec dbms_stats.gather_schema_stats(user);

PL/SQL procedure successfully completed.

23:51:57 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
------------------------------ ---------- ----------- ---------------
OTHER_TEST_DATA 151376 37844 01-AUG 23:51:56
SOURCE_TABLE 301861 51707 01-AUG 23:51:57
TEST_DATA 151992 37998 01-AUG 23:51:57

23:52:05 SQL> REM OK, we now have table statistics on OTHER_TEST_DATA
23:52:14 SQL> REM ... what about columns ?
23:52:20 SQL> select column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:52:32 2 from user_tab_col_statistics
23:52:40 3 wehre table_name = 'OTHER_TEST_DATA'
23:52:47 SQL> l3
3* wehre table_name = 'OTHER_TEST_DATA'
23:52:49 SQL> c/wehre/where
3* where table_name = 'OTHER_TEST_DATA'
23:52:55 SQL> l
1 select column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
2 from user_tab_col_statistics
3* where table_name = 'OTHER_TEST_DATA'
23:52:55 SQL> /

COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
OWNER_COL 25 .04 1 NONE 5781 01-AUG 23:51:56
NAME_COL 28382 .000035234 1 NONE 37844 01-AUG 23:51:56
ID_COL 50681 .000019731 1 NONE 37844 01-AUG 23:51:56
TYPE_COL 30 .033333333 1 NONE 5781 01-AUG 23:51:56
CREATED_COL 929 .001076426 1 NONE 5781 01-AUG 23:51:56

23:52:56 SQL> REM This table has exactly the same data as table "TEST_DATA"
23:53:10 SQL> REM Yet, Oracle's method_opt=>'FOR ALL COLUMNS SIZE AUTO' behaviour did not gather any histograms
23:53:34 SQL> REM The reason ? That sys.col_usage$ does not indicate any queries as having been executed yet !
23:53:55 SQL> REM slightly counter-intuitive ? Until queries are executed, Oracle will not decide which
23:54:18 SQL> REM histograms to create.
23:54:23 SQL> REM but the first few queries will, probably, "suffer" for lack of histograms
23:54:46 SQL>
23:54:46 SQL> exec dbms_stats.gather_schema_stats(user, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

23:55:41 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
------------------------------ ---------- ----------- ---------------
OTHER_TEST_DATA 151724 37931 01-AUG 23:55:40
SOURCE_TABLE 302095 58785 01-AUG 23:55:41
TEST_DATA 152784 38196 01-AUG 23:55:41

23:55:54 SQL> col table_name format a18
23:56:15 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:56:23 2 from user_tab_col_statistics
23:56:27 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2,1;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
OTHER_TEST_DATA CREATED_COL 903 .00110742 1 NONE 4915 01-AUG 23:55:40
TEST_DATA CREATED_COL 899 .001112347 1 NONE 5021 01-AUG 23:55:41
OTHER_TEST_DATA ID_COL 50377 .00001985 1 NONE 37931 01-AUG 23:55:40
TEST_DATA ID_COL 50762 .0000197 1 NONE 38196 01-AUG 23:55:41
OTHER_TEST_DATA NAME_COL 28466 .00003513 1 NONE 37931 01-AUG 23:55:40
TEST_DATA NAME_COL 28556 .000035019 1 NONE 38196 01-AUG 23:55:41
OTHER_TEST_DATA OWNER_COL 23 .043478261 1 NONE 4915 01-AUG 23:55:40
TEST_DATA OWNER_COL 20 .05 1 NONE 5021 01-AUG 23:55:41
OTHER_TEST_DATA TYPE_COL 32 .03125 1 NONE 4915 01-AUG 23:55:40
TEST_DATA TYPE_COL 31 .032258065 1 NONE 5021 01-AUG 23:55:41

10 rows selected.

23:57:05 SQL> REM All the columns are without Histograms
23:57:16 SQL> REM The "method_opt=>'FOR ALL COLUMNS SIZE 1'" has deleted the Histograms
23:57:45 SQL>
23:57:55 SQL> exec dbms_stats.gather_schema_stats(user,method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');

PL/SQL procedure successfully completed.

23:58:11 SQL> select table_name, num_rows, sample_size, last_analyzed from user_tables order by 1;

TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
------------------ ---------- ----------- ---------------
OTHER_TEST_DATA 151456 37864 01-AUG 23:58:07
SOURCE_TABLE 303666 303666 01-AUG 23:58:10
TEST_DATA 152804 38201 01-AUG 23:58:11

23:58:19 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
23:58:28 2 from user_tab_col_statistics
23:58:35 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
TEST_DATA CREATED_COL 957 .002409639 254 HEIGHT BALANCED 6456 01-AUG 23:58:11
OTHER_TEST_DATA CREATED_COL 886 .00243309 254 HEIGHT BALANCED 5061 01-AUG 23:58:07
TEST_DATA ID_COL 50294 .000019883 1 NONE 38201 01-AUG 23:58:11
OTHER_TEST_DATA ID_COL 50622 .000019754 1 NONE 37864 01-AUG 23:58:07
TEST_DATA NAME_COL 28410 .000052648 254 HEIGHT BALANCED 38201 01-AUG 23:58:11
OTHER_TEST_DATA NAME_COL 28447 .000052918 254 HEIGHT BALANCED 37864 01-AUG 23:58:07
TEST_DATA OWNER_COL 23 3.3141E-06 23 FREQUENCY 6456 01-AUG 23:58:11
OTHER_TEST_DATA OWNER_COL 20 3.2683E-06 20 FREQUENCY 5061 01-AUG 23:58:07
TEST_DATA TYPE_COL 31 3.3141E-06 31 FREQUENCY 6456 01-AUG 23:58:11
OTHER_TEST_DATA TYPE_COL 31 3.2683E-06 31 FREQUENCY 5061 01-AUG 23:58:07

10 rows selected.

23:58:47 SQL> REM Aah ! SIZE SKEWONLY caused Histograms to be gathered on most columns
23:59:13 SQL> REM but do you notice that sample sizes are different between the two tables ?
23:59:26 SQL> REM and that estmated "num_distinct" and "num_buckets" aren't the same ! (even though the data is the same !)
23:59:51 SQL> REM sampling errors have creeped in !
REM I'd like to test this back in 10.2.0.x where x is less than 4 !!
00:00:00 SQL>
00:00:01 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');

PL/SQL procedure successfully completed.

00:00:41 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:00:49 2 from user_tab_col_statistics
00:00:55 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
TEST_DATA CREATED_COL 1380 .002350099 254 HEIGHT BALANCED 151815 02-AUG 00:00:41
OTHER_TEST_DATA CREATED_COL 1380 .002350099 254 HEIGHT BALANCED 151815 02-AUG 00:00:30
TEST_DATA ID_COL 50605 .000019761 1 NONE 151815 02-AUG 00:00:41
OTHER_TEST_DATA ID_COL 50605 .000019761 1 NONE 151815 02-AUG 00:00:30
TEST_DATA NAME_COL 30275 .0000387 254 HEIGHT BALANCED 151815 02-AUG 00:00:41
OTHER_TEST_DATA NAME_COL 30275 .0000387 254 HEIGHT BALANCED 151815 02-AUG 00:00:30
TEST_DATA OWNER_COL 26 3.2935E-06 26 FREQUENCY 151815 02-AUG 00:00:41
OTHER_TEST_DATA OWNER_COL 26 3.2935E-06 26 FREQUENCY 151815 02-AUG 00:00:30
TEST_DATA TYPE_COL 41 3.2935E-06 41 FREQUENCY 151815 02-AUG 00:00:41
OTHER_TEST_DATA TYPE_COL 41 3.2935E-06 41 FREQUENCY 151815 02-AUG 00:00:30

10 rows selected.

00:01:03 SQL> REM Now, with 100% sampling and SIZE SKEWONLY, we have "better" histograms
00:02:18 SQL> REM (see, e.g. how OWNER_COL's NUM_DISTINCT is up from 23 and 20 to 26 and 26 for the two tables)
00:02:45 SQL> REM (or, TYPE_COL NUM_DISTINCT is up from 31 to 41)
REM I'd like to test this back in 10.2.0.x where x is less than 4 !!
00:03:01 SQL>
00:09:42 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE AUTO');

PL/SQL procedure successfully completed.

00:11:02 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:11:09 2 from user_tab_col_statistics
00:11:15 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
TEST_DATA CREATED_COL 1380 .000724638 1 NONE 151815 02-AUG 00:11:02
OTHER_TEST_DATA CREATED_COL 1380 .000724638 1 NONE 151815 02-AUG 00:10:54
TEST_DATA ID_COL 50605 .000019761 1 NONE 151815 02-AUG 00:11:02
OTHER_TEST_DATA ID_COL 50605 .000019761 1 NONE 151815 02-AUG 00:10:54
TEST_DATA NAME_COL 30275 .000033031 1 NONE 151815 02-AUG 00:11:02
OTHER_TEST_DATA NAME_COL 30275 .000033031 1 NONE 151815 02-AUG 00:10:54
TEST_DATA OWNER_COL 26 3.2935E-06 26 FREQUENCY 151815 02-AUG 00:11:02
OTHER_TEST_DATA OWNER_COL 26 .038461538 1 NONE 151815 02-AUG 00:10:54
TEST_DATA TYPE_COL 41 3.2935E-06 41 FREQUENCY 151815 02-AUG 00:11:02
OTHER_TEST_DATA TYPE_COL 41 .024390244 1 NONE 151815 02-AUG 00:10:54

10 rows selected.

00:11:21 SQL> REM Note how the "SIZE AUTO" reverted to Histograms on OWNER_COL and TYPE_COL only in TEST_DATA !
00:12:29 SQL> REM It isn't intelligent enough to figure out that OTHER_TEST_DATA's columns are also candidates
00:13:03 SQL>
00:13:20 SQL> select count(*) from other_test_data where type_col like 'T%';

COUNT(*)
----------
12291

00:13:34 SQL> exec dbms_Stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

00:14:00 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE AUTO');

PL/SQL procedure successfully completed.

00:14:27 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:14:28 2 from user_tab_col_statistics
00:14:34 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
TEST_DATA CREATED_COL 1380 .000724638 1 NONE 151815 02-AUG 00:14:27
OTHER_TEST_DATA CREATED_COL 1380 .000724638 1 NONE 151815 02-AUG 00:14:19
TEST_DATA ID_COL 50605 .000019761 1 NONE 151815 02-AUG 00:14:27
OTHER_TEST_DATA ID_COL 50605 .000019761 1 NONE 151815 02-AUG 00:14:19
TEST_DATA NAME_COL 30275 .000033031 1 NONE 151815 02-AUG 00:14:27
OTHER_TEST_DATA NAME_COL 30275 .000033031 1 NONE 151815 02-AUG 00:14:19
TEST_DATA OWNER_COL 26 3.2935E-06 26 FREQUENCY 151815 02-AUG 00:14:27
OTHER_TEST_DATA OWNER_COL 26 .038461538 1 NONE 151815 02-AUG 00:14:19
TEST_DATA TYPE_COL 41 3.2935E-06 41 FREQUENCY 151815 02-AUG 00:14:27
OTHER_TEST_DATA TYPE_COL 41 3.2935E-06 41 FREQUENCY 151815 02-AUG 00:14:19

10 rows selected.

00:14:41 SQL> REM Now it has gathered a Histogram on OTHER_TEST_DATA.TYPE_COL because it is now reflected in SYS.COL_USAGE$
00:15:23 SQL> select t.table_name, cu.* from user_objects t, sys.col_usage$ cu
00:15:55 2 where t.object_id=cu.obj#
00:16:04 SQL> l1
1* select t.table_name, cu.* from user_objects t, sys.col_usage$ cu
00:16:07 SQL> c/table/object
1* select t.object_name, cu.* from user_objects t, sys.col_usage$ cu
00:16:10 SQL> l
1 select t.object_name, cu.* from user_objects t, sys.col_usage$ cu
2* where t.object_id=cu.obj#
00:16:11 SQL> /

OBJECT_NAME
--------------------------------------------------------------------------------------------------------------------------------
OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------------
TEST_DATA
53392 4 0 0 0 0 1 0 01-AUG 23:45:31

TEST_DATA
53392 3 0 0 0 1 0 0 01-AUG 23:39:23

TEST_DATA
53392 1 1 0 0 0 0 0 01-AUG 23:30:26

OTHER_TEST_DATA
53395 4 0 0 0 0 1 0 02-AUG 00:14:00


00:16:12 SQL> REM Col#4 in both tables is TYPE_COL
00:16:45 SQL> REM In both tables, I have run a LIKE query on this column
00:17:02 SQL> REM The TIMESTAMP shows that there has been no query on TEST_DATA recently, but there has been 1 query on OTHER_TEST_DATA
00:17:32 SQL>
00:17:32 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

00:18:43 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:18:50 2 from user_tab_col_statistics
00:19:04 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
TEST_DATA CREATED_COL 1380 .000724638 1 NONE 151815 02-AUG 00:18:43
OTHER_TEST_DATA CREATED_COL 1380 .000724638 1 NONE 151815 02-AUG 00:18:36
TEST_DATA ID_COL 50605 .000019761 1 NONE 151815 02-AUG 00:18:43
OTHER_TEST_DATA ID_COL 50605 .000019761 1 NONE 151815 02-AUG 00:18:36
TEST_DATA NAME_COL 30275 .000033031 1 NONE 151815 02-AUG 00:18:43
OTHER_TEST_DATA NAME_COL 30275 .000033031 1 NONE 151815 02-AUG 00:18:36
TEST_DATA OWNER_COL 26 .038461538 1 NONE 151815 02-AUG 00:18:43
OTHER_TEST_DATA OWNER_COL 26 .038461538 1 NONE 151815 02-AUG 00:18:36
TEST_DATA TYPE_COL 41 .024390244 1 NONE 151815 02-AUG 00:18:43
OTHER_TEST_DATA TYPE_COL 41 .024390244 1 NONE 151815 02-AUG 00:18:36

10 rows selected.

00:19:08 SQL> REM Now, the "SIZE 1" has deleted all Histograms
00:19:21 SQL>
00:19:21 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>100,method_opt=>'FOR ALL COLUMNS SIZE 250');

PL/SQL procedure successfully completed.

00:19:51 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:19:53 2
00:20:00 SQL>
00:20:01 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:20:05 2 from user_tab_col_statistics
00:20:11 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
TEST_DATA CREATED_COL 1380 .002372155 250 HEIGHT BALANCED 151815 02-AUG 00:19:51
OTHER_TEST_DATA CREATED_COL 1380 .002372155 250 HEIGHT BALANCED 151815 02-AUG 00:19:41
TEST_DATA ID_COL 50605 .000019761 250 HEIGHT BALANCED 151815 02-AUG 00:19:51
OTHER_TEST_DATA ID_COL 50605 .000019761 250 HEIGHT BALANCED 151815 02-AUG 00:19:41
TEST_DATA NAME_COL 30275 .0000387 250 HEIGHT BALANCED 151815 02-AUG 00:19:51
OTHER_TEST_DATA NAME_COL 30275 .0000387 250 HEIGHT BALANCED 151815 02-AUG 00:19:41
TEST_DATA OWNER_COL 26 3.2935E-06 26 FREQUENCY 151815 02-AUG 00:19:51
OTHER_TEST_DATA OWNER_COL 26 3.2935E-06 26 FREQUENCY 151815 02-AUG 00:19:41
TEST_DATA TYPE_COL 41 3.2935E-06 41 FREQUENCY 151815 02-AUG 00:19:51
OTHER_TEST_DATA TYPE_COL 41 3.2935E-06 41 FREQUENCY 151815 02-AUG 00:19:41

10 rows selected.

00:20:21 SQL> REM Conclusions :
00:21:30 SQL> REM 1. SIZE AUTO relies on prior queries being reflected on SYS.COL_USAGE$
00:21:48 SQL> REM If this is a new table or the column hasn't been queried, Oracle will not gather Histograms even if it is a candidate
00:22:15 SQL> REM 2. SIZE SKEWONLY gathers histograms irrespective of whether the table and/or column has been queried before
00:22:48 SQL> REM 3. The Sample Size is a signficant determinant of accuracy (NUM_DISTINCT and NUM_BUCKETS)
REM Another thing to test in 10.2.0.[1-3]
00:23:23 SQL>
00:23:24 SQL>
00:23:24 SQL>
REM Causing a very high degree of skew
00:24:18 SQL> update test_data set owner_col = 'HEMANT_X' where mod(id_col,100) != 0;

150273 rows updated.

00:24:43 SQL> commit;

Commit complete.

00:24:45 SQL> exec dbms_stats.gather_schema_stats(user);

PL/SQL procedure successfully completed.

00:25:15 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:25:26 2 from user_tab_col_statistics
00:25:32 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
TEST_DATA CREATED_COL 934 .001070664 1 NONE 5950 02-AUG 00:25:15
OTHER_TEST_DATA CREATED_COL 941 .001062699 1 NONE 5730 02-AUG 00:25:13
TEST_DATA ID_COL 50279 .000019889 1 NONE 37863 02-AUG 00:25:15
OTHER_TEST_DATA ID_COL 50729 .000019713 1 NONE 37813 02-AUG 00:25:13
TEST_DATA NAME_COL 28255 .000035392 1 NONE 37863 02-AUG 00:25:15
OTHER_TEST_DATA NAME_COL 28643 .000034913 1 NONE 37813 02-AUG 00:25:13
TEST_DATA OWNER_COL 10 3.2925E-06 10 FREQUENCY 5950 02-AUG 00:25:15
OTHER_TEST_DATA OWNER_COL 24 .041666667 1 NONE 5730 02-AUG 00:25:13
TEST_DATA TYPE_COL 35 3.2925E-06 35 FREQUENCY 5950 02-AUG 00:25:15
OTHER_TEST_DATA TYPE_COL 30 3.2623E-06 30 FREQUENCY 5730 02-AUG 00:25:13

10 rows selected.

00:25:42 SQL> select owner_col, count(*) from test_data group by owner_col;

OWNER_COL COUNT(*)
------------------------------ ----------
MDSYS 27
DMSYS 3
PUBLIC 987
OLAPSYS 27
CTXSYS 12
SYSTEM 9
EXFSYS 6
SH 9
ORDSYS 48
ORDPLUGINS 3
SYSMAN 39
OE 6
XDB 27
IX 3
BI 3
HEMANT_X 150273
SYS 321
WMSYS 12

18 rows selected.

00:26:11 SQL> REM With very high skew (HEMANT_X accounting for 99% of the rows)
00:26:28 SQL> REM the AUTO_SAMPLE_SIZE of 5,950 generated a Frequency Histogram of 10 buckets for 10 distinct values
00:27:17 SQL> REM However, the column actually has 18 distint values !!
REM Another thing to test in 10.2.0.[1-3]

00:27:33 SQL> exec dbms_stats.gather_schema_stats(user,estimate_percent=>20);

PL/SQL procedure successfully completed.

00:27:58 SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram, sample_size, last_analyzed
00:28:13 2 from user_tab_col_statistics
00:28:24 3 where table_name in ('TEST_DATA','OTHER_TEST_DATA') order by 2, 1 desc;

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM SAMPLE_SIZE LAST_ANALYZED
------------------ ------------------------------ ------------ ---------- ----------- --------------- ----------- ---------------
TEST_DATA CREATED_COL 1269 .000788022 1 NONE 30211 02-AUG 00:27:58
OTHER_TEST_DATA CREATED_COL 1262 .000792393 1 NONE 30390 02-AUG 00:27:54
TEST_DATA ID_COL 50492 .000019805 1 NONE 30211 02-AUG 00:27:58
OTHER_TEST_DATA ID_COL 50341 .000019865 1 NONE 30390 02-AUG 00:27:54
TEST_DATA NAME_COL 28226 .000035428 1 NONE 30211 02-AUG 00:27:58
OTHER_TEST_DATA NAME_COL 28169 .0000355 1 NONE 30390 02-AUG 00:27:54
TEST_DATA OWNER_COL 17 3.3101E-06 17 FREQUENCY 30211 02-AUG 00:27:58
OTHER_TEST_DATA OWNER_COL 26 .038461538 1 NONE 30390 02-AUG 00:27:54
TEST_DATA TYPE_COL 38 3.3101E-06 38 FREQUENCY 30211 02-AUG 00:27:58
OTHER_TEST_DATA TYPE_COL 36 3.2906E-06 36 FREQUENCY 30390 02-AUG 00:27:54

10 rows selected.

00:28:33 SQL> REM With a 20% sample, Oracle came up with an estimate of 17 distinct values.
00:29:36 SQL>
00:29:37 SQL> spool off




I hope that you can follow what might seem haphazard testing.
I hadn't prepared a "test script" with all the queries in advance but was building the queries as I went along, based on results of preceding queries.

No comments:

Aggregated by orafaq.com

Aggregated by orafaq.com
This blog is being aggregated by orafaq.com

Top 50 Oracle SQL Blogs 2016

Top 50 Oracle SQL Blogs 2016
Top 50 Oracle SQL Blogs 2016