UPDATE :
This post is relevant for 10.2 AUTO_SAMPLE_SIZE has been significantly improved in 11.2 and I recommend AUTO_SAMPLE_SIZE in 11.2 and above
As 10g has introduced some 'auto'matic behaviour as defaults (AUTO_SAMPLE_SIZE for estimate_percent and FOR ALL COLUMNS SIZE AUTO for method_opt), I decided to trace how these work.
In a production database, I had noticed Oracle gathering statistics on a single column for 20minutes after which it began the same task again, but with a sample size 10 times as large.
MetaLink Note#343849.1 provides some explanation for this behaviour.
Since I'd already created a large table of 83million rows (see my previous postings), I decided to see how the 'auto'mation worked.
(So far, in my test runs I always specify an estimate_percent and method_opt, not leaving them to default in 10g).
So I ran two DBMS_STATS.GATHER_TABLE_STATS executions on table TEST_APPEND which has 83million rows. The first execution was with "defaults" and the second one was with estimate_percent=>1
For the 'auto' sample size, a 10046 trace showed Oracle gathering samples *twice*. At the first run, it gathered statistics on *all* columns, using a sample of 0.0058412006 % (yes, 6 thousandthds of 1 percent !). Oracle seemed to have ben dissatistifed with what it gathered on only a few of the columns so re-ran a gather for a subset of columns but using a sample of 0.05854120060 %. (that is 6 hundredths of 1 percent !).
With such small sample sizes, you would think that the gather statistics was fast.
This is what it did :
select /*+ no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
*/ count(*),count("OWNER"),count(distinct "OWNER"),
sum(sys_op_opnsize("OWNER")),substrb(dump(min("OWNER"),16,0,32),1,120),
substrb(dump(max("OWNER"),16,0,32),1,120),count("OBJECT_NAME"),
count(distinct "OBJECT_NAME"),sum(sys_op_opnsize("OBJECT_NAME")),
substrb(dump(min(substrb("OBJECT_NAME",1,32)),16,0,32),1,120),
substrb(dump(max(substrb("OBJECT_NAME",1,32)),16,0,32),1,120),
count("SUBOBJECT_NAME"),count(distinct "SUBOBJECT_NAME"),
sum(sys_op_opnsize("SUBOBJECT_NAME")),substrb(dump(min("SUBOBJECT_NAME"),16,
0,32),1,120),substrb(dump(max("SUBOBJECT_NAME"),16,0,32),1,120),
count("OBJECT_ID"),count(distinct "OBJECT_ID"),
sum(sys_op_opnsize("OBJECT_ID")),substrb(dump(min("OBJECT_ID"),16,0,32),1,
120),substrb(dump(max("OBJECT_ID"),16,0,32),1,120),count("DATA_OBJECT_ID"),
count(distinct "DATA_OBJECT_ID"),sum(sys_op_opnsize("DATA_OBJECT_ID")),
substrb(dump(min("DATA_OBJECT_ID"),16,0,32),1,120),
substrb(dump(max("DATA_OBJECT_ID"),16,0,32),1,120),count("OBJECT_TYPE"),
count(distinct "OBJECT_TYPE"),sum(sys_op_opnsize("OBJECT_TYPE")),
substrb(dump(min("OBJECT_TYPE"),16,0,32),1,120),
substrb(dump(max("OBJECT_TYPE"),16,0,32),1,120),count("CREATED"),
count(distinct "CREATED"),substrb(dump(min("CREATED"),16,0,32),1,120),
substrb(dump(max("CREATED"),16,0,32),1,120),count("LAST_DDL_TIME"),
count(distinct "LAST_DDL_TIME"),substrb(dump(min("LAST_DDL_TIME"),16,0,32),
1,120),substrb(dump(max("LAST_DDL_TIME"),16,0,32),1,120),count("TIMESTAMP"),
count(distinct "TIMESTAMP"),sum(sys_op_opnsize("TIMESTAMP")),
substrb(dump(min("TIMESTAMP"),16,0,32),1,120),substrb(dump(max("TIMESTAMP"),
16,0,32),1,120),count("STATUS"),count(distinct "STATUS"),
sum(sys_op_opnsize("STATUS")),substrb(dump(min("STATUS"),16,0,32),1,120),
substrb(dump(max("STATUS"),16,0,32),1,120),count("TEMPORARY"),
count(distinct "TEMPORARY"),sum(sys_op_opnsize("TEMPORARY")),
substrb(dump(min("TEMPORARY"),16,0,32),1,120),substrb(dump(max("TEMPORARY"),
16,0,32),1,120),count("GENERATED"),count(distinct "GENERATED"),
sum(sys_op_opnsize("GENERATED")),substrb(dump(min("GENERATED"),16,0,32),1,
120),substrb(dump(max("GENERATED"),16,0,32),1,120),count("SECONDARY"),
count(distinct "SECONDARY"),sum(sys_op_opnsize("SECONDARY")),
substrb(dump(min("SECONDARY"),16,0,32),1,120),substrb(dump(max("SECONDARY"),
16,0,32),1,120)
from
"HEMANT"."TEST_APPEND" sample ( .0058412006) t
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 0.04 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 63.66 150.85 585686 8845 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 63.69 150.89 585686 8845 0 1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61 (recursive depth: 1)
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT GROUP BY (cr=8845 pr=585686 pw=0 time=150851071 us)
4798 TABLE ACCESS SAMPLE TEST_APPEND (cr=8845 pr=585686 pw=0 time=149699664 us)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 4587 0.30 134.90
db file sequential read 170 0.06 1.01
latch: shared pool 5 0.01 0.03
latch: cache buffers lru chain 2 0.00 0.00
********************************************************************************
and
select /*+ no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
*/ count(*),count("OBJECT_NAME"),count(distinct "OBJECT_NAME"),
count("SUBOBJECT_NAME"),count(distinct "SUBOBJECT_NAME"),
sum(sys_op_opnsize("SUBOBJECT_NAME")),substrb(dump(min("SUBOBJECT_NAME"),16,
0,32),1,120),substrb(dump(max("SUBOBJECT_NAME"),16,0,32),1,120),
count("OBJECT_ID"),count(distinct "OBJECT_ID"),count("DATA_OBJECT_ID"),
count(distinct "DATA_OBJECT_ID"),sum(sys_op_opnsize("DATA_OBJECT_ID")),
substrb(dump(min("DATA_OBJECT_ID"),16,0,32),1,120),
substrb(dump(max("DATA_OBJECT_ID"),16,0,32),1,120)
from
"HEMANT"."TEST_APPEND" sample ( .0584120060) t
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 95.72 176.11 1136088 85933 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 95.73 176.12 1136088 85933 0 1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61 (recursive depth: 1)
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT GROUP BY (cr=85933 pr=1136088 pw=0 time=176113365 us)
49017 TABLE ACCESS SAMPLE TEST_APPEND (cr=85933 pr=1136088 pw=0 time=153397014 us)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 8994 0.27 152.61
db file sequential read 23 0.00 0.01
latch: cache buffers lru chain 1 0.00 0.00
********************************************************************************
Oracle actually read 585,686 blocks and used 8,845 blocks to get a
sample of 4,798 rows in 150.89 seconds in it's first 'auto' run. In the second 'auto' run (for columns OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID and DATA_OBJECT_ID only -- presumably being satisfied with the statistics it had on column OWNER, OBJECT_TYPE etc !), it read 1,136,088 blocks, used 85,933 of them to read a sample of 49,017 rows in 176.12 seconds. That 1,136,088 blocks meant that it had actually read the whole table !
Thus, it's AUTO_SAMPLE_SIZE resulted in reading 1.5 times the table size and discarding 96.87% of what it had read. The number of rows sampled was less than 0.07% of the table.
The gathering of statistics took 326.97 seconds.
Could I do better with even a 1% sample size (note : Oracle had actually read 0.07% of the rows only !) ?
So the estimate_percent=>1 was my second execution.
select /*+ no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
*/ count(*),count("OWNER"),count(distinct "OWNER"),
sum(sys_op_opnsize("OWNER")),substrb(dump(min("OWNER"),16,0,32),1,120),
substrb(dump(max("OWNER"),16,0,32),1,120),count("OBJECT_NAME"),
count(distinct "OBJECT_NAME"),sum(sys_op_opnsize("OBJECT_NAME")),
substrb(dump(min(substrb("OBJECT_NAME",1,32)),16,0,32),1,120),
substrb(dump(max(substrb("OBJECT_NAME",1,32)),16,0,32),1,120),
count("SUBOBJECT_NAME"),count(distinct "SUBOBJECT_NAME"),
sum(sys_op_opnsize("SUBOBJECT_NAME")),substrb(dump(min("SUBOBJECT_NAME"),16,
0,32),1,120),substrb(dump(max("SUBOBJECT_NAME"),16,0,32),1,120),
count("OBJECT_ID"),count(distinct "OBJECT_ID"),
sum(sys_op_opnsize("OBJECT_ID")),substrb(dump(min("OBJECT_ID"),16,0,32),1,
120),substrb(dump(max("OBJECT_ID"),16,0,32),1,120),count("DATA_OBJECT_ID"),
count(distinct "DATA_OBJECT_ID"),sum(sys_op_opnsize("DATA_OBJECT_ID")),
substrb(dump(min("DATA_OBJECT_ID"),16,0,32),1,120),
substrb(dump(max("DATA_OBJECT_ID"),16,0,32),1,120),count("OBJECT_TYPE"),
count(distinct "OBJECT_TYPE"),sum(sys_op_opnsize("OBJECT_TYPE")),
substrb(dump(min("OBJECT_TYPE"),16,0,32),1,120),
substrb(dump(max("OBJECT_TYPE"),16,0,32),1,120),count("CREATED"),
count(distinct "CREATED"),substrb(dump(min("CREATED"),16,0,32),1,120),
substrb(dump(max("CREATED"),16,0,32),1,120),count("LAST_DDL_TIME"),
count(distinct "LAST_DDL_TIME"),substrb(dump(min("LAST_DDL_TIME"),16,0,32),
1,120),substrb(dump(max("LAST_DDL_TIME"),16,0,32),1,120),count("TIMESTAMP"),
count(distinct "TIMESTAMP"),sum(sys_op_opnsize("TIMESTAMP")),
substrb(dump(min("TIMESTAMP"),16,0,32),1,120),substrb(dump(max("TIMESTAMP"),
16,0,32),1,120),count("STATUS"),count(distinct "STATUS"),
sum(sys_op_opnsize("STATUS")),substrb(dump(min("STATUS"),16,0,32),1,120),
substrb(dump(max("STATUS"),16,0,32),1,120),count("TEMPORARY"),
count(distinct "TEMPORARY"),sum(sys_op_opnsize("TEMPORARY")),
substrb(dump(min("TEMPORARY"),16,0,32),1,120),substrb(dump(max("TEMPORARY"),
16,0,32),1,120),count("GENERATED"),count(distinct "GENERATED"),
sum(sys_op_opnsize("GENERATED")),substrb(dump(min("GENERATED"),16,0,32),1,
120),substrb(dump(max("GENERATED"),16,0,32),1,120),count("SECONDARY"),
count(distinct "SECONDARY"),sum(sys_op_opnsize("SECONDARY")),
substrb(dump(min("SECONDARY"),16,0,32),1,120),substrb(dump(max("SECONDARY"),
16,0,32),1,120)
from
"HEMANT"."TEST_APPEND" sample ( 1.0000000000) t
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.03 0.03 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 118.46 205.36 1253727 895705 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 118.49 205.40 1253727 895705 0 1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61 (recursive depth: 1)
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT GROUP BY (cr=895705 pr=1253727 pw=0 time=205366708 us)
836541 TABLE ACCESS SAMPLE TEST_APPEND (cr=895705 pr=1253727 pw=0 time=184883528 us)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file scattered read 9855 0.31 151.24
db file sequential read 1 0.00 0.00
********************************************************************************
Aah ! A 1% sample was executed in 205.4 seconds, much faster time,
going through 1,253,727 blocks and using 895,705 of them to sample 836,541 rows.
The estimate_percent=>1 was *faster* and sampled many *more* rows than the 'auto_sample_size' !
Here’s a summary :
Gather Stats run Sample % age Disk Blocks Actually Read Rows Sampled Time taken (sec)
Run 1 (all cols) .0058412006 585,686 4,798 150.85
Run 2 (some cols) .0584120060 1,136,088 49,017 176.12
Auto : total -- 1,721,774 53,815 326.97
Estimate percent 1 1.0000000000 1,253,727 836,541 205.40