23 May, 2009

Index Block Splits : 50-50

UPDATE 20-Feb-11 : I have also covered the case of a REVERSE KEY Index for a unique index on a monotonously increasing sequence.

To follow up on my previous posting on "90-10" Index Block Splits which happen when for indexes on monotonically increasing values, here is a demonstration of "50-50" Index Block Splits.
In this case, new Key Values are to be inserted "within" the existing Key Values (as the COUNTRY_NAME, DEPARTMENT_NAME etc are within the same "universe" of values.
(For an explanation of how Indexes are "logically ordered" structures and why the Key Value for a new row can only go into one specific block and, furthermore, how and why this can cause Leaf Block Splits, please see my previous posting).

Rather than walking through this demo, I'll just post the SQLs and results. You can see that, with 50-50 Block Splits, the Index Leaf Blocks may not be as tightly packed as expected. (Rebuilding the Index would help !)


SQL> drop table demo_ibs_50_50 purge;

Table dropped.

SQL>
SQL> -- REM create the demo table
SQL> create table demo_ibs_50_50 (
2 employee_id number not null,
3 country_name varchar2(10) not null,
4 dept_name varchar2(18) not null,
5 employee_name varchar2(128) not null,
6 join_date date)
7 /

Table created.

SQL>
SQL> -- create a non unique index
SQL> create index demo_ibs_50_50_n1 on demo_ibs_50_50 (country_name,dept_name,employee_name) pctfree 1;

Index created.

SQL>
SQL> delete source_table;

50656 rows deleted.

SQL> insert into source_table select * from dba_objects where object_id is not null;

50656 rows created.

SQL> select max(object_id) from source_table;

MAX(OBJECT_ID)
--------------
53253

SQL>
SQL>
SQL> -- REM create a new session and run an insert
SQL> -- then check the statistics for the insert
SQL> connect hemant/hemant
Connected.
SQL> insert into demo_ibs_50_50
2 select object_id, substr(owner,1,10),substr(object_type,1,18),rpad(object_name,20,dbms_random.string('X',3)),created
3 from source_table
4 where object_id is not null
5 order by object_id
6 /

50656 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> select sn.name, ms.value
2 from v$statname sn, v$mystat ms
3 where sn.statistic#=ms.statistic#
4 and sn.name like '%leaf node %'
5 order by 1
6 /

NAME VALUE
---------------------------------------------------------------- ----------
leaf node 90-10 splits 0
leaf node splits 432

SQL>
SQL> exec dbms_stats.gather_table_stats('','DEMO_IBS_50_50',estimate_percent=>100,cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL> select num_rows, distinct_keys, blevel, leaf_blocks, distinct_keys/leaf_blocks
2 from user_indexes where index_name = 'DEMO_IBS_50_50_N1'
3 /

NUM_ROWS DISTINCT_KEYS BLEVEL LEAF_BLOCKS DISTINCT_KEYS/LEAF_BLOCKS
---------- ------------- ---------- ----------- -------------------------
50656 44873 2 433 103.632794

SQL> analyze index demo_ibs_50_50_n1 validate structure;

Index analyzed.

SQL> select lf_rows, lf_blks, br_blks, del_lf_rows, pct_used from index_stats;

LF_ROWS LF_BLKS BR_BLKS DEL_LF_ROWS PCT_USED
---------- ---------- ---------- ----------- ----------
50656 433 3 0 68

SQL>
SQL> REM #################
SQL> REM OBSERVATION !
SQL> REM The first set of block creations are all from Block Splits.
SQL> REM #################
SQL>
SQL>
SQL>
SQL> -- REM run another set of inserts
SQL> -- REM we now simulate a Row-By-Row Insert that would happen for creating new Employees
SQL> connect hemant/hemant
Connected.
SQL>
SQL> declare
2 i number;
3
4 begin
5 for i in 1..1000
6 loop
7 insert into demo_ibs_50_50
8 select object_id+100000+i,substr(owner,1,10),substr(object_type,1,18),rpad(object_name,20,dbms_random.string('X',3)),created+vsize(object_name)
9 from source_table
10 where object_id is not null
11 and object_id = 1000+i;
12 commit;
13 end loop;
14 end;
15 /

PL/SQL procedure successfully completed.

SQL>
SQL> select sn.name, ms.value
2 from v$statname sn, v$mystat ms
3 where sn.statistic#=ms.statistic#
4 and sn.name like '%leaf node %'
5 order by 1
6 /

NAME VALUE
---------------------------------------------------------------- ----------
leaf node 90-10 splits 0
leaf node splits 5

SQL>
SQL> exec dbms_stats.gather_table_stats('','DEMO_IBS_50_50',estimate_percent=>100,cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL> select num_rows, distinct_keys, blevel, leaf_blocks, distinct_keys/leaf_blocks
2 from user_indexes where index_name = 'DEMO_IBS_50_50_N1'
3 /

NUM_ROWS DISTINCT_KEYS BLEVEL LEAF_BLOCKS DISTINCT_KEYS/LEAF_BLOCKS
---------- ------------- ---------- ----------- -------------------------
51656 45578 2 438 104.059361

SQL> analyze index demo_ibs_50_50_n1 validate structure;

Index analyzed.

SQL> select lf_rows, lf_blks, br_blks, del_lf_rows, pct_used from index_stats;

LF_ROWS LF_BLKS BR_BLKS DEL_LF_ROWS PCT_USED
---------- ---------- ---------- ----------- ----------
51656 438 3 0 69

SQL>
SQL>
SQL>
SQL> -- REM run another set of inserts
SQL> -- REM we now run a bulk insert again !
SQL> connect hemant/hemant
Connected.
SQL>
SQL> insert into demo_ibs_50_50
2 select object_id+200000,substr(owner,1,10),substr(object_type,1,18),rpad(object_name,20,dbms_random.string('X',3)),created+vsize(object_name)
3 from source_table
4 where object_id is not null
5 and object_id between 1000 and 2000
6 /

1001 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> select sn.name, ms.value
2 from v$statname sn, v$mystat ms
3 where sn.statistic#=ms.statistic#
4 and sn.name like '%leaf node %'
5 order by 1
6 /

NAME VALUE
---------------------------------------------------------------- ----------
leaf node 90-10 splits 0
leaf node splits 9

SQL>
SQL> exec dbms_stats.gather_table_stats('','DEMO_IBS_50_50',estimate_percent=>100,cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL> select num_rows, distinct_keys, blevel, leaf_blocks, distinct_keys/leaf_blocks
2 from user_indexes where index_name = 'DEMO_IBS_50_50_N1'
3 /

NUM_ROWS DISTINCT_KEYS BLEVEL LEAF_BLOCKS DISTINCT_KEYS/LEAF_BLOCKS
---------- ------------- ---------- ----------- -------------------------
52657 46282 2 447 103.53915

SQL> analyze index demo_ibs_50_50_n1 validate structure;

Index analyzed.

SQL> select lf_rows, lf_blks, br_blks, del_lf_rows, pct_used from index_stats;

LF_ROWS LF_BLKS BR_BLKS DEL_LF_ROWS PCT_USED
---------- ---------- ---------- ----------- ----------
52657 447 3 0 68

SQL>


Thus, new Index Leaf Blocks are allocated through Block Splits from existing blocks.

.
UPDATE 20-Feb-11 : I have also covered the case of a REVERSE KEY Index for a unique index on a monotonously increasing sequence.
.
.

3 comments:

Asif Momen said...

Good one.

Unknown said...

Thanks Hemant.
During my research on 50-50 split, i somewhere read about an intresting behavior (probably Richard foote mentioned that on his blog)...Did u observe anyting similar to this :

"However, there are a number of scenarios where an index can become fragmented, despite the lack of associated delete or update operations.
One example is where a non-unique index stores large number of duplicate values, indeed so many duplicates of a specific value that it actually requires many index leaf
blocks to store all occurrences of each specific value.
What happens when a block is filled and a 50-50 block split eventuates ? 1/2 the entries go into one block and 1/2 the entries go into the newly allocated block and
the new index entry goes where ? Well, if the block is full of nothing but the same index entry because we have lots of duplicate values and the new index entry is
associated with the maximum rowid for the specific index value, it goes into the newly allocated leaf block because remember,
all index entries must maintain their logical sorted order.

In fact, if most/all of the subsequent inserts of the duplicate value are also associated with the maximum rowid currently assigned to the index entry,
all the subsequent inserts now go into the newly allocated block. The 50% of free space remaining in the previous leaf block doesn’t get used.

Once the newly allocated block also gets filled, again with the same duplicate index values, a 50-50 block split occurs and again, most if not all
the subsequent inserts go into the newly allocated leaf block. We now have two, 1/2 empty leaf blocks that has space that can not effectively not be
reused because they are filled the same index value but with rowid values less than those being currently allocated.
And so the cycle continues with these duplicate index entries and any other duplicate index values that span multiple leaf blocks …
"

Unknown said...

Thanks Hemant.
During my research on 50-50 split, i somewhere read about an intresting behavior (probably Richard foote mentioned that on his blog)...Did u observe anyting similar to this :

"However, there are a number of scenarios where an index can become fragmented, despite the lack of associated delete or update operations.
One example is where a non-unique index stores large number of duplicate values, indeed so many duplicates of a specific value that it actually requires many index leaf
blocks to store all occurrences of each specific value.
What happens when a block is filled and a 50-50 block split eventuates ? 1/2 the entries go into one block and 1/2 the entries go into the newly allocated block and
the new index entry goes where ? Well, if the block is full of nothing but the same index entry because we have lots of duplicate values and the new index entry is
associated with the maximum rowid for the specific index value, it goes into the newly allocated leaf block because remember,
all index entries must maintain their logical sorted order.

In fact, if most/all of the subsequent inserts of the duplicate value are also associated with the maximum rowid currently assigned to the index entry,
all the subsequent inserts now go into the newly allocated block. The 50% of free space remaining in the previous leaf block doesn’t get used.

Once the newly allocated block also gets filled, again with the same duplicate index values, a 50-50 block split occurs and again, most if not all
the subsequent inserts go into the newly allocated leaf block. We now have two, 1/2 empty leaf blocks that has space that can not effectively not be
reused because they are filled the same index value but with rowid values less than those being currently allocated.
And so the cycle continues with these duplicate index entries and any other duplicate index values that span multiple leaf blocks …
"