26 May, 2009

Index Block Splits and REBUILD

In the course of my previous post on 50-50 Index Block Splits, I added the lines "You can see that, with 50-50 Block Splits, the Index Leaf Blocks may not be as tightly packed as expected. (Rebuilding the Index would help !)"

In the "modern" world of Oracle Experts where it has become fashionable to say "Indexes don't need to be rebuilt" or "Indexes rarely need to be rebuilt", I tend to disagree. I find that it makes sense to rebuild indexes when there is sufficient reason to do so ("circular reasoning" ?).

I take exactly the same case as in my previous example on 50-50 Index Block Splits, but add an ALTER INDEX REBULD at the end.

Here goes .... (my reputation with the "Experts" ?)


SQL> drop table demo_ibs_50_50 purge;

Table dropped.

SQL>
SQL> -- REM create the demo table
SQL> create table demo_ibs_50_50 (
2 employee_id number not null,
3 country_name varchar2(10) not null,
4 dept_name varchar2(18) not null,
5 employee_name varchar2(128) not null,
6 join_date date)
7 /

Table created.

SQL>
SQL> -- create a non unique index
SQL> create index demo_ibs_50_50_n1 on demo_ibs_50_50 (country_name,dept_name,employee_name) pctfree 1;

Index created.

SQL>
SQL> delete source_table;

50659 rows deleted.

SQL> insert into source_table select * from dba_objects where object_id is not null;

50659 rows created.

SQL> select max(object_id) from source_table;

MAX(OBJECT_ID)
--------------
53270

SQL>
SQL>
SQL> -- REM create a new session and run an insert
SQL> -- then check the statistics for the insert
SQL> connect hemant/hemant
Connected.
SQL> insert into demo_ibs_50_50
2 select object_id, substr(owner,1,10),substr(object_type,1,18),rpad(object_name,20,dbms_random.string('X',3)),created
3 from source_table
4 where object_id is not null
5 order by object_id
6 /

50659 rows created.

SQL> commit;

Commit complete.

SQL>
SQL>
SQL>
SQL> -- REM run another set of inserts
SQL> -- REM we now simulate a Row-By-Row Insert that would happen for creating new Employees
SQL> connect hemant/hemant
Connected.
SQL>
SQL> declare
2 i number;
3
4 begin
5 for i in 1..1000
6 loop
7 insert into demo_ibs_50_50
8 select object_id+100000+i,substr(owner,1,10),substr(object_type,1,18),rpad(object_name,20,dbms_random.string('X',3)),created+vsize(object_name)
9 from source_table
10 where object_id is not null
11 and object_id = 1000+i;
12 commit;
13 end loop;
14 end;
15 /

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL>
SQL> -- REM run another set of inserts
SQL> -- REM we now run a bulk insert again !
SQL> connect hemant/hemant
Connected.
SQL>
SQL> insert into demo_ibs_50_50
2 select object_id+200000,substr(owner,1,10),substr(object_type,1,18),rpad(object_name,20,dbms_random.string('X',3)),created+vsize(object_name)
3 from source_table
4 where object_id is not null
5 and object_id between 1000 and 2000
6 /

1001 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> exec dbms_stats.gather_table_stats('','DEMO_IBS_50_50',estimate_percent=>100,cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL> select num_rows, distinct_keys, blevel, leaf_blocks, distinct_keys/leaf_blocks
2 from user_indexes where index_name = 'DEMO_IBS_50_50_N1'
3 /

NUM_ROWS DISTINCT_KEYS BLEVEL LEAF_BLOCKS DISTINCT_KEYS/LEAF_BLOCKS
---------- ------------- ---------- ----------- -------------------------
52660 46285 2 446 103.778027

SQL> analyze index demo_ibs_50_50_n1 validate structure;

Index analyzed.

SQL> select lf_rows, lf_blks, br_blks, del_lf_rows, pct_used from index_stats;

LF_ROWS LF_BLKS BR_BLKS DEL_LF_ROWS PCT_USED
---------- ---------- ---------- ----------- ----------
52660 446 3 0 69

SQL>
SQL>
SQL> REM REM REM ###############################
SQL> REM Here's the additional step from the previous DEMO_IBS_50_50
SQL> REM What happens to the index size and pct_used if I rebuild it now ?
SQL>
SQL> alter index demo_ibs_50_50_n1 rebuild;

Index altered.

SQL>
SQL> exec dbms_stats.gather_table_stats('','DEMO_IBS_50_50',estimate_percent=>100,cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL> select num_rows, distinct_keys, blevel, leaf_blocks, distinct_keys/leaf_blocks
2 from user_indexes where index_name = 'DEMO_IBS_50_50_N1'
3 /

NUM_ROWS DISTINCT_KEYS BLEVEL LEAF_BLOCKS DISTINCT_KEYS/LEAF_BLOCKS
---------- ------------- ---------- ----------- -------------------------
52660 46285 2 309 149.789644

SQL> analyze index demo_ibs_50_50_n1 validate structure;

Index analyzed.

SQL> select lf_rows, lf_blks, br_blks, del_lf_rows, pct_used from index_stats;

LF_ROWS LF_BLKS BR_BLKS DEL_LF_ROWS PCT_USED
---------- ---------- ---------- ----------- ----------
52660 309 3 0 98

SQL>



The Index "shrunk" from 446 Leaf Blocks (69 % used) to 309 Leaf Blocks (98% used) ! Surely, I will "suffer" Index Block Splits for subsequent inserts.

But stop to think. What if this had shrunk from 4million Leaf Blocks to 3million Leaf Blocks ? What if I had a frequently used / popular / expensive query that did Index Fast Full Scans on this Index ? Remember that I haven't shrunk the table, only the Index. Wouldn't an Index Fast Full Scan of 3million Blocks be a "wee bit" faster than a Scan of 4million Blocks ?

What if the index had shrunk from 40,000 Leaf Blocks to 30,000 Leaf Blocks ? What is the likelihood of 100% of the Index Blocks now fitting into my SGA ? Isn't it more likely now ?

Also see "Rebuilding Indexes" and "Rebuilding Indexes - When and Why ?", my previous posts in March 2008.

.
.
.

2 comments:

Asif Momen said...

Good work!

Joel Garry said...

You aren't disagreeing with the experts. The general expert advice is "don't rebuild indices blindly," that is, use a demo like you've done on a test version of your real data to be able to evaluate whether rebuilding an index will do good. This general advice comes in response to old myths that say that in general, indexes need to be rebuilt, so rebuild them all even if it means spending your weekends. Some experts go a bit further and say in general when they've tried (at least testing response time to their usual usage) rebuilding indices, they've seen little difference.

The real answer is to know your data and how it is used. This kind of goes against this type of demo, which doesn't create data distributed as real data is distributed. That's not a bad thing - it can emphasize how a certain distribution or data density can affect response time, as well as clarify unusual limitations - but it is something to be aware of. It can also hide idiosyncracies in the way actual data is added. I know it is way different than my oltp data, since I tend to work on enterprise software with concatenated keys, with different volumes of data dependent on parts of the key (like, division 10 has fewer sequentially increasing orders than division 20).

But even so, everything I've seen with my data overwhelmingly says I more frequently migrate to new hardware or app versions needing a dependent exp/imp than I need to rebuild any of my 1000's of indices.

Anyways, good to see good demos and explanations of such things. I do know people who waste their weekends because they listen to stupid myth-based software tools rather than try to figure out the reality.

word: drualt