Hemant's Oracle DBA Blog: CLUSTERING

I had written a post about CLUSTERING_FACTOR more than a year ago.

I am re-visiting the issue here with a Test Case. I've written the SQL script with REM comments inline. Therefore, the output below is self-explanatory :

SQL> 
SQL> set arraysize 5000
SQL> set pages600
SQL> set linesize132
SQL> 
SQL> set feedback off
SQL> 
SQL> set SQLPrompt ''
set SQLContinue ''

REM  I wil be creating my test table from this table of 1.368million rows
REM  This is a copy of DBA_OBJECTS multiplied many times
select num_rows, sample_size, blocks from user_tables where table_name = 'SOURCE_TABLE';

  NUM_ROWS SAMPLE_SIZE     BLOCKS                                                                                                   
---------- ----------- ----------                                                                                                   
   1368063     1368063      18967                                                                                                   


REM  This is my Test Table to review CLUSTERING_FACTOR
drop table MY_TEST_TABLE purge ;
create table MY_TEST_TABLE  as select * from SOURCE_TABLE where 1=2;
alter table MY_TEST_TABLE modify (OWNER not null, OBJECT_NAME not null, OBJECT_ID not null);
alter table MY_TEST_TABLE nologging;

create index MY_TEST_TABLE_NDX_1 on MY_TEST_TABLE (OWNER, OBJECT_NAME) ;


REM REM REM ################### Setup Table with first dataset

insert into MY_TEST_TABLE select * from SOURCE_TABLE ;
commit;
-- rem need to do a FullTableScan to effect delayed block cleanout
select count(*) from MY_TEST_TABLE dc;

  COUNT(*)                                                                                                                          
----------                                                                                                                          
   1368063                                                                                                                          

exec dbms_stats.gather_table_stats(user,'MY_TEST_TABLE',estimate_percent=>100,method_opt=>'FOR COLUMNS OWNER, OBJECT_NAME SIZE 250',cascade=>TRUE,no_invalidate=>FALSE);

select table_name, num_rows, blocks from user_tables order by 1;

TABLE_NAME                       NUM_ROWS     BLOCKS                                                                                
------------------------------ ---------- ----------                                                                                
MY_TEST_TABLE                     1368063      19277                                                                                
SOURCE_TABLE                      1368063      18967                                                                                
select index_name, num_rows, distinct_keys, leaf_blocks, clustering_factor from user_indexes order by 1;

INDEX_NAME                       NUM_ROWS DISTINCT_KEYS LEAF_BLOCKS CLUSTERING_FACTOR                                               
------------------------------ ---------- ------------- ----------- -----------------                                               
MY_TEST_TABLE_NDX_1               1368063         49232        9778           1343029                                               
REM  Notice the very high CLUSTERING_FACTOR -- almost as high as the number of rows !

select  count(*) from source_table where owner = 'SYSTEM';

  COUNT(*)                                                                                                                          
----------                                                                                                                          
     12258                                                                                                                          
select  count(*) from my_test_table where owner = 'SYSTEM';

  COUNT(*)                                                                                                                          
----------                                                                                                                          
     12258                                                                                                                          
REM  The number of rows with "owner='SYSTEM'"  is actually less than 1% of the total row count !
REM  ie, 12,258 out of 1,368,063


REM  Let's see the Execution Plan that Oracle generates for the query
set autotrace traceonly
select owner, object_name, object_type, created from MY_TEST_TABLE where owner = 'SYSTEM';

Execution Plan
----------------------------------------------------------                                                                          
Plan hash value: 965484217                                                                                                          
                                                                                                                                    
-----------------------------------------------------------------------------------                                                 
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |                                                 
-----------------------------------------------------------------------------------                                                 
|   0 | SELECT STATEMENT  |               | 12258 |   610K|  3527   (2)| 00:00:43 |                                                 
|*  1 |  TABLE ACCESS FULL| MY_TEST_TABLE | 12258 |   610K|  3527   (2)| 00:00:43 |                                                 
-----------------------------------------------------------------------------------                                                 
                                                                                                                                    
Predicate Information (identified by operation id):                                                                                 
---------------------------------------------------                                                                                 
                                                                                                                                    
   1 - filter("OWNER"='SYSTEM')                                                                                                     


Statistics
----------------------------------------------------------                                                                          
          1  recursive calls                                                                                                        
          0  db block gets                                                                                                          
      18817  consistent gets                                                                                                        
       5373  physical reads                                                                                                         
          0  redo size                                                                                                              
     362774  bytes sent via SQL*Net to client                                                                                       
        514  bytes received via SQL*Net from client                                                                                 
          4  SQL*Net roundtrips to/from client                                                                                      
          0  sorts (memory)                                                                                                         
          0  sorts (disk)                                                                                                           
      12258  rows processed                                                                                                         

set autotrace off

REM That was 18,817 consistent gets for 12,258 rows, using a FullTableScan
REM  some of the consistent gets were for the parse
REM   (and some of the 'consistent gets' were re-reads of the same blocks because of the
REM    arraysize of 5,000 being less than the number of rows, resulting in 3 additional SQL*Net round trips)
REM Surprised that Oracle prefers a FullTableScan ?
REM That is preferred because of the high CLUSTERING_FACTOR
REM which indicates to Oracle that it will have to keep skipping to a different Table Block
REM for each next ROWID fetched from the index.
REM This means that Oracle estimates it will have to make very many SingleBlockReadCalls
REM to read the rows from the Table

REM  Let's see the results if I force the Index
set autotrace traceonly
select /*+ INDEX (M MY_TEST_TABLE_NDX_1) */ owner, object_name, object_type, created from MY_TEST_TABLE M where owner = 'SYSTEM';

Execution Plan
----------------------------------------------------------                                                                          
Plan hash value: 519844067                                                                                                          
                                                                                                                                    
---------------------------------------------------------------------------------------------------                                 
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |                                 
---------------------------------------------------------------------------------------------------                                 
|   0 | SELECT STATEMENT            |                     | 12258 |   610K| 12138   (1)| 00:02:26 |                                 
|   1 |  TABLE ACCESS BY INDEX ROWID| MY_TEST_TABLE       | 12258 |   610K| 12138   (1)| 00:02:26 |                                 
|*  2 |   INDEX RANGE SCAN          | MY_TEST_TABLE_NDX_1 | 12258 |       |    90   (0)| 00:00:02 |                                 
---------------------------------------------------------------------------------------------------                                 
                                                                                                                                    
Predicate Information (identified by operation id):                                                                                 
---------------------------------------------------                                                                                 
                                                                                                                                    
   2 - access("OWNER"='SYSTEM')                                                                                                     


Statistics
----------------------------------------------------------                                                                          
          1  recursive calls                                                                                                        
          0  db block gets                                                                                                          
      10638  consistent gets                                                                                                        
          0  physical reads                                                                                                         
          0  redo size                                                                                                              
     109228  bytes sent via SQL*Net to client                                                                                       
        514  bytes received via SQL*Net from client                                                                                 
          4  SQL*Net roundtrips to/from client                                                                                      
          0  sorts (memory)                                                                                                         
          0  sorts (disk)                                                                                                           
      12258  rows processed                                                                                                         

set autotrace off

REM That was 10,638 consistent gets for 12,258 rows, using an IndexRangeScan
REM Oracle had preferred the FullTableScan (18,817 consistent gets) because
REM that would be achieved using MultiBlockReadCalls
REM While the retrieval using the Index would require SingleBlockReadCalls


REM REM REM ############### DELETE approximately 20% (282,528) of the rows in the table

connect hemant/hemant
Connected.
delete MY_TEST_TABLE where object_id < 11000;
commit;
-- rem need to do a FullTableScan to effect delayed block cleanout
select /*+ FULL (dc) */ count(*) from MY_TEST_TABLE dc;

  COUNT(*)                                                                                                                          
----------                                                                                                                          
   1085535                                                                                                                          

exec dbms_stats.gather_table_stats(user,'MY_TEST_TABLE',estimate_percent=>100,method_opt=>'FOR COLUMNS OWNER, OBJECT_NAME SIZE 250',cascade=>TRUE,no_invalidate=>FALSE);

select table_name, num_rows, blocks from user_tables order by 1;

TABLE_NAME                       NUM_ROWS     BLOCKS                                                                                
------------------------------ ---------- ----------                                                                                
MY_TEST_TABLE                     1085535      19277                                                                                
SOURCE_TABLE                      1368063      18967                                                                                
select index_name, num_rows, distinct_keys, leaf_blocks, clustering_factor from user_indexes order by 1;

INDEX_NAME                       NUM_ROWS DISTINCT_KEYS LEAF_BLOCKS CLUSTERING_FACTOR                                               
------------------------------ ---------- ------------- ----------- -----------------                                               
MY_TEST_TABLE_NDX_1               1085535         39460        8372           1072489                                               
REM  I still have a very high CLUSTERING_FACTOR

select  count(*) from source_table where owner = 'SYSTEM';

  COUNT(*)                                                                                                                          
----------                                                                                                                          
     12258                                                                                                                          
select  count(*) from my_test_table where owner = 'SYSTEM';

  COUNT(*)                                                                                                                          
----------                                                                                                                          
       135                                                                                                                          
REM  The number of rows with "owner='SYSTEM'"  is much lesser now !
REM  ie, 135 out of 1,085,535

REM  Let's see the Execution Plan that Oracle generates for the query
set autotrace traceonly
select owner, object_name, object_type, created from MY_TEST_TABLE where owner = 'SYSTEM';

Execution Plan
----------------------------------------------------------                                                                          
Plan hash value: 519844067                                                                                                          
                                                                                                                                    
---------------------------------------------------------------------------------------------------                                 
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |                                 
---------------------------------------------------------------------------------------------------                                 
|   0 | SELECT STATEMENT            |                     |   135 |  7020 |   138   (0)| 00:00:02 |                                 
|   1 |  TABLE ACCESS BY INDEX ROWID| MY_TEST_TABLE       |   135 |  7020 |   138   (0)| 00:00:02 |                                 
|*  2 |   INDEX RANGE SCAN          | MY_TEST_TABLE_NDX_1 |   135 |       |     4   (0)| 00:00:01 |                                 
---------------------------------------------------------------------------------------------------                                 
                                                                                                                                    
Predicate Information (identified by operation id):                                                                                 
---------------------------------------------------                                                                                 
                                                                                                                                    
   2 - access("OWNER"='SYSTEM')                                                                                                     


Statistics
----------------------------------------------------------                                                                          
          1  recursive calls                                                                                                        
          0  db block gets                                                                                                          
        214  consistent gets                                                                                                        
          0  physical reads                                                                                                         
          0  redo size                                                                                                              
       1559  bytes sent via SQL*Net to client                                                                                       
        492  bytes received via SQL*Net from client                                                                                 
          2  SQL*Net roundtrips to/from client                                                                                      
          0  sorts (memory)                                                                                                         
          0  sorts (disk)                                                                                                           
        135  rows processed                                                                                                         

set autotrace off

REM That was 214 consistent gets for 135 rows, using an IndexRangeScan
REM Aah !  Now, Oracle prefers the IndexRangeScan because it estimates
REM much lesser SingleBlockReadCalls to read the only 135 rows from the table.
REM This inspite of the high CLUSTERING_FACTOR
REM Oracle accounts for both expected Cardinality and Clustering when selecting an execution plan.


REM REM REM ############### TRUNCATE and Re-INSERT rows

connect hemant/hemant
Connected.
truncate table MY_TEST_TABLE reuse storage;
insert into MY_TEST_TABLE select * from SOURCE_TABLE ;
commit;
-- rem need to do a FullTableScan to effect delayed block cleanout
select /*+ FULL (dc) */ count(*) from MY_TEST_TABLE dc;

  COUNT(*)                                                                                                                          
----------                                                                                                                          
   1368063                                                                                                                          

exec dbms_stats.gather_table_stats(user,'MY_TEST_TABLE',estimate_percent=>100,method_opt=>'FOR COLUMNS OWNER, OBJECT_NAME SIZE 250',cascade=>TRUE,no_invalidate=>FALSE);

select table_name, num_rows, blocks from user_tables order by 1;

TABLE_NAME                       NUM_ROWS     BLOCKS                                                                                
------------------------------ ---------- ----------                                                                                
MY_TEST_TABLE                     1368063      18765                                                                                
SOURCE_TABLE                      1368063      18967                                                                                
select index_name, num_rows, distinct_keys, leaf_blocks, clustering_factor from user_indexes order by 1;

INDEX_NAME                       NUM_ROWS DISTINCT_KEYS LEAF_BLOCKS CLUSTERING_FACTOR                                               
------------------------------ ---------- ------------- ----------- -----------------                                               
MY_TEST_TABLE_NDX_1               1368063         49232        9778           1343029                                               

select  count(*) from source_table where owner = 'SYSTEM';

  COUNT(*)                                                                                                                          
----------                                                                                                                          
     12258                                                                                                                          

set autotrace traceonly
select owner, object_name, object_type, created from MY_TEST_TABLE where owner = 'SYSTEM';

Execution Plan
----------------------------------------------------------                                                                          
Plan hash value: 965484217                                                                                                          
                                                                                                                                    
-----------------------------------------------------------------------------------                                                 
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |                                                 
-----------------------------------------------------------------------------------                                                 
|   0 | SELECT STATEMENT  |               | 12258 |   610K|  3434   (2)| 00:00:42 |                                                 
|*  1 |  TABLE ACCESS FULL| MY_TEST_TABLE | 12258 |   610K|  3434   (2)| 00:00:42 |                                                 
-----------------------------------------------------------------------------------                                                 
                                                                                                                                    
Predicate Information (identified by operation id):                                                                                 
---------------------------------------------------                                                                                 
                                                                                                                                    
   1 - filter("OWNER"='SYSTEM')                                                                                                     


Statistics
----------------------------------------------------------                                                                          
          1  recursive calls                                                                                                        
          0  db block gets                                                                                                          
      18779  consistent gets                                                                                                        
       3797  physical reads                                                                                                         
          0  redo size                                                                                                              
     362796  bytes sent via SQL*Net to client                                                                                       
        514  bytes received via SQL*Net from client                                                                                 
          4  SQL*Net roundtrips to/from client                                                                                      
          0  sorts (memory)                                                                                                         
          0  sorts (disk)                                                                                                           
      12258  rows processed                                                                                                         

set autotrace off


REM REM REM ##########################
REM What can I do to improve the CLUSTERING_FACTOR ??
REM REM REM ############### Reorder the rows !!!

connect hemant/hemant
Connected.
drop table TEMP_TBL;
drop table TEMP_TBL
           *
ERROR at line 1:
ORA-00942: table or view does not exist 


create table TEMP_TBL  as select * from MY_TEST_TABLE where 1=2;
alter table TEMP_TBL nologging;
insert /*+ APPEND */ into TEMP_TBL select * from MY_TEST_TABLE  order by OWNER, OBJECT_NAME;
commit;
truncate table MY_TEST_TABLE;
insert into MY_TEST_TABLE select * from TEMP_TBL  order by OWNER, OBJECT_NAME;
commit;
-- rem need to do a FullTableScan to effect delayed block cleanout
select /*+ FULL (dc) */ count(*) from MY_TEST_TABLE dc;

  COUNT(*)                                                                                                                          
----------                                                                                                                          
   1368063                                                                                                                          

exec dbms_stats.gather_table_stats(user,'MY_TEST_TABLE',estimate_percent=>100,method_opt=>'FOR COLUMNS OWNER, OBJECT_NAME SIZE 250',cascade=>TRUE,no_invalidate=>FALSE);

select table_name, num_rows, blocks from user_tables order by 1;

TABLE_NAME                       NUM_ROWS     BLOCKS                                                                                
------------------------------ ---------- ----------                                                                                
MY_TEST_TABLE                     1368063      19277                                                                                
SOURCE_TABLE                      1368063      18967                                                                                
TEMP_TBL                                                                                                                            
select index_name, num_rows, distinct_keys, leaf_blocks, clustering_factor from user_indexes order by 1;

INDEX_NAME                       NUM_ROWS DISTINCT_KEYS LEAF_BLOCKS CLUSTERING_FACTOR                                               
------------------------------ ---------- ------------- ----------- -----------------                                               
MY_TEST_TABLE_NDX_1               1368063         49232        7186             22498                                               
REM Note how the CLUSTERING_FACTOR is now down to 22,498 from the 1,343,029 on the first run !

select  count(*) from source_table where owner = 'SYSTEM';

  COUNT(*)                                                                                                                          
----------                                                                                                                          
     12258                                                                                                                          
select  count(*) from MY_TEST_TABLE where owner = 'SYSTEM';

  COUNT(*)                                                                                                                          
----------                                                                                                                          
     12258                                                                                                                          
REM  The number of rows with "owner='SYSTEM'"  is the same in as in the first test.
REM  ie, 12,258 out of 1,368,063

set autotrace traceonly
select owner, object_name, object_type, created from MY_TEST_TABLE where owner = 'SYSTEM';

Execution Plan
----------------------------------------------------------                                                                          
Plan hash value: 519844067                                                                                                          
                                                                                                                                    
---------------------------------------------------------------------------------------------------                                 
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |                                 
---------------------------------------------------------------------------------------------------                                 
|   0 | SELECT STATEMENT            |                     | 12258 |   610K|   270   (1)| 00:00:04 |                                 
|   1 |  TABLE ACCESS BY INDEX ROWID| MY_TEST_TABLE       | 12258 |   610K|   270   (1)| 00:00:04 |                                 
|*  2 |   INDEX RANGE SCAN          | MY_TEST_TABLE_NDX_1 | 12258 |       |    67   (0)| 00:00:01 |                                 
---------------------------------------------------------------------------------------------------                                 
                                                                                                                                    
Predicate Information (identified by operation id):                                                                                 
---------------------------------------------------                                                                                 
                                                                                                                                    
   2 - access("OWNER"='SYSTEM')                                                                                                     


Statistics
----------------------------------------------------------                                                                          
          1  recursive calls                                                                                                        
          0  db block gets                                                                                                          
        273  consistent gets                                                                                                        
          0  physical reads                                                                                                         
          0  redo size                                                                                                              
      99453  bytes sent via SQL*Net to client                                                                                       
        514  bytes received via SQL*Net from client                                                                                 
          4  SQL*Net roundtrips to/from client                                                                                      
          0  sorts (memory)                                                                                                         
          0  sorts (disk)                                                                                                           
      12258  rows processed                                                                                                         

set autotrace off

REM That was only 273 consistent gets for 12,235 rows, using an IndexRangeScan
REM With the table's rows *physically re-ordered*
REM Oracle now had to do only 273 block gets
REM when the forced IndexRangeScan in the badly clustered table resulted in 10,638 block gets !

REM The *same* data, after being re-ordered
REM is down from 106,38 block gets to 273 block gets
REM with the *same* Execution Plan !!

drop table TEMP_TBL;

spool off
3 comments:

Unknown said...: Hi Hemant, I read your post and the interesting proposed tests. Some time ago
I examined the subject, I believe that clustering factor is a fundamental importance
for access type range scan. If I remember correctly, this factor influences
heavily Cost formula when the use of the index case
mode range scan. So I think that if you investigate as a table
is accessed and it can verify that most of the accesses
are using a certain index range scan, then
the sorting table data is a factor to be taken seriously.
I wrote not only for this reason, but mainly to ask you a question that has nothing
to do with clustering factor, but with the testing so far.
If I remember correctly I read somewhere that when you use the package
dbms_stats on a table that also performs cleanout of the blocks,
is it correct?
so in your example the following SQL statement
- Rem FullTableScan need to do to effect a delayed block cleanout
select count (*) from MY_TEST_TABLE dc;

followed

dbms_stats.gather_table_stats exec (user, 'MY_TEST_TABLE', estimate_percent => 100, method_opt => 'FOR COLUMNS OWNER, OBJECT_NAME SIZE 250', cascade => TRUE, no_invalidate => FALSE);

it can be replaced with the only run of the package dbms_stats

dbms_stats.gather_table_stats exec (user, 'MY_TEST_TABLE', estimate_percent => 100, method_opt => 'FOR COLUMNS OWNER, OBJECT_NAME SIZE 250', cascade => TRUE, no_invalidate => FALSE);

is it correct?

Thanks a lot

Donatello Settembrino; 12 December, 2009
Hemant K Chitale said...: Reordering the data to improve the clustering factor makes sense only if
a. There is only 1 index on the table
OR
b. All the important queries rely on the one particular index and you can afford degradation in the performance of other queries relying on other indexes.

I explicitly put in a count(*) so as to avoid delayed block clenaout on large tables in my tests.
The gather_stats does the same thing as well, as you've pointed out.
See http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:44798632736844; 03 January, 2010
Unknown said...: Hemant,

"Reordering the data to improve the clustering factor makes sense only if
a. There is only 1 index on the table"

I think his statement is correct but not complete
I think it is more correct to say

a.There is only 1 index on the table, or if the first fields
that make up the key, of two or more indices coincide. Can improve
clustering factor even when it has been defined indexes
a table whose data were included in a full order
different from the keys of the index, where then the clustering factor
is high for all indexes.

b. I think it is identical to
I said in my previous posting

"So I think that if you investigate as a table
is accessed and it can verify that most of the accesses
are using a certain index range scan, then
the sorting table data is a factor to be taken seriously"

which would imply that if such access more (eg 95% of cases)
are all attributable to the use of a single index I
can accept the performance degradation of the second index (5% of cases)
... but that obviously are chosen :)

best regards

Donatello Settembrino; 05 January, 2010
Hemant's Oracle DBA Blog

28 February, 2009

CLUSTERING_FACTOR

3 comments:

Aggregated by orafaq.com

OCP 12c