24 January, 2008

Is RAID 5 bad ? Always bad ?

Do DBAs always have to say "RAID 5 is bad" ? Should that be how we start talking of database performance ? Isn't that assertion slightly judgmental ?

So, here I am going to stick my neck out and possibly lose all credibility with the BAARF forum....

But just to preface this : I do accept the contention that in terms of safety and recoverability of data {and performance during recovery}, RAID-10 beats RAID-5.
[See Art S. Kagel's and Cary Millsap's papers at BAARF]. Yes, I agree.

What I disagree is the "jumping the gun" statements that RAID-5 is bad for database performance.


On a recent MetaLink forum {which started with the question "How RAC helps if there is I/O contention"}, I made a few off-the-cuff observations {not about RAC but about I/O and the assertion "RAID 5 is bad" } :

1. What I've seen from "real world" experiences is the "huge batch processing" jobs also do a lot of SELECTs (whether as part of INSERT .. SELECT or CTAS or pure data validation) which can be complex (transformations are attempted in the INSERT .. SELECT or CTAS !!) joins as well AND/OR use a numberof "temporary" tables [created as permanent tables] as staging tables repeatedly.
Another culprit can be the ETL tool or middleware which attempts Transformations on complex data sets using lots of CPU and "scratch disks", all of it outside of Oracle.

First validate if I/O is really the constraint.
Check what the I/O wait times are.
Check Oracle ResponseTime components {either in StatsPack OR much preferably in extended tracing on one run of the batch job}

and identify where the bottleneck is.

Just as it is too easy to say "I/O is slow" , so it is to say "RAID-5 is bad" without actually determining if Write I/O or QueryProcessing (huge numbers of blocks being read, and transformations being attempted).


2. What if the ResponseTime Profile was :
Reads : 35%
CPU : 50%
Writes : 15%
Moving from RAID-5 to RAID-10 may speed up your writes by 200%.
What is the total performance improvement ? 10%.

"RAID-5 is bad" is like "Microsoft is Evil".
Some people do choose to live with Microsoft.
Others find MS Windows much better than Linux.
Others don't have a choice.
To another group, it really doesn't make a difference.
Now, if you had specific code that doesn't work well on Microsoft, then you would say "Microsoft is Evil".

If your application is 90% CPU bound, speeding up your I/O does not get you a pat on the back from your users.
If your application does 1000s of loops between the Client and the Server,your I/O subsystem isn't the issue.
If your design uses an ETL tool which is outside of the database, your Database server I/O may not make a difference.


3. We're not very sure here what sort of I/O contention there is.
We do not have facts to go on for this case so we are all generalising.

In the same vein I have been generalising that if the jobs are actually spending more time on QueryProcessing and CPU or away from the DBServer--- anything other than WRITEs -- then RAID-5 wouldn't be an issue.

A database where Queries are very heavy and most of the response time is "I/O contention" but turns out to be on 'db file scattered read' and 'db file sequential read' wouldn't suffer if it was using RAID-5.

16 January, 2008

The Impact of the Clustering Factor

Update 21-Feb-08: I had used this blog posting as a reference in my comment on Richard Foote's blog posting on Clustering Factor in concatenated index. As Tom Kyte pointed out, my posting wasn't clear enough, so I've tried to explain it again in my comments at Richard Foote's blog posting Clustering Factor in concatenated indexes :

I've pasted those comments again at the bottom of this posting.

Recently, I was given a query that was taking 40minutes to execute.This query involved a range scan on table_1 , among many other operationson other tables with joins.


The greatest portion of the effort [reading the 10046 trace file] was :
STAT #1 id=15 cnt=22 pid=5 pos=2 obj=866159 op='TABLE ACCESS BY INDEX ROWID table_1 '
STAT #1 id=16 cnt=3354870 pid=15 pos=1 obj=14815979 op='INDEX RANGE SCAN index_a '

{an additional filter was also being applied at operation 15}
The whole Fetch phase of that query :

FETCH #1:c=95480000,e=2366647483,p=691119,cr=1353212,cu=0,mis=0,r=1,dep=0,og=4

Event waited on Times Max. Wait Total Waited
----------------------- Waited ---------- ------------
SQL*Net message to client 4 0.00 0.00
SQL*Net message from client 4 0.00 0.00
db file sequential read 691119 0.87 2293.16
log file sync 1 0.01 0.01

read 1.353million blocks, requiring 691thousand physical block reads and
took 2.3billion microseconds – ie, 2,367 seconds.
The actual total elapsed time that SQLPlus reported was 40min24seconds (2,424 seconds).
The 691thousand physical reads accounted for 2,293 seconds --- 94.6% of the query’s
response time. {BTW : Just for our knowledge : What is the Hit Ratio ? 48.9%}


On rewriting the query with a NO_INDEX hint, the next execution showed :

STAT #1 id=16 cnt=16 pid=5 pos=2 obj=866159 op='TABLE ACCESS BY INDEX ROWID table_1 '
STAT #1 id=17 cnt=3314718 pid=16 pos=1 obj=14815976 op='INDEX RANGE SCAN index_b '

What I now got was :

FETCH #1:c=10320000,e=257057504,p=78328,cr=159300,cu=0,mis=0,r=1,dep=0,og=4

Event waited on Times Max. Wait Total Waited
----------------------- Waited ---------- ------------
SQL*Net message to client 4 0.00 0.00
SQL*Net message from client 4 0.00 0.00
db file sequential read 78328 0.44 218.20
buffer busy waits 9715 0.40 30.53
latch free 26 0.02 0.41

Only 159,300 Buffer Gets and 257million microseconds execution time.
There were 78,328 single block Physical Read calls, taking 218sconds only.
SQLPlus reports the total query execution time was 4min24seconds.
{Thus, although Physical Reads still account for 82.6% or 84.8% of the total response time,
the absolute elapsed time is less than 5 minutes – much better than the 40minutes}.
Reducing the number of Physical Read calls was the solution.
{BTW : Just for our knowledge : What is the Hit Ratio ? 50.8% -- not much of an improvement from 48.9% ?}

Both indexes are multi-column indexes on the same table.

So the question was :
“In the first SQL RowSourceOperations there 3.354million rowids fetched from index_a and the total count of Buffer Gets for the whole query was 1.353million.
In the second SQL’s RowSourceOperations there 3.315million rowids fetched from index_b and yet the total count of Buffer Gets for the whole query was only 159,300 !
Why the difference ?”

The difference lay in the Clustering Factor.

SQL> select index_name, leaf_blocks, avg_leaf_blocks_per_key, avg_data_blocks_per_key, clustering_factor, distinct_keys
2 from dba_indexes
3 where owner = 'schema'
4 and index_name in ('index_b','index_a');
INDEX_NAME LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY
------------------------------ ----------- -----------------------
AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR DISTINCT_KEYS
----------------------- ----------------- -------------
index_b 463706 1 13 3490380 261796
index_a 456250 1 4 21444164 4845082
SQL>

Index index_b has a much lower Clustering Factor. This means that the number of Table Blocks
to be visited for the ROWIDs returned from an Index Range Scan is much lower than the count of blocks that a session using ROWIDs from index_a would have to read.
As inserts into table table_1 are generally ordered by column_x, the co-locality of lookups by column_x would be higher. That is why rowids returned by column_x from index_b would most likely find Table rows in contiguous blocks, not having to make multiple read calls on the Table.

Update 21-Feb-08, Since the above posting might not have been clear enough, this is how I tried to re-explain it at in my comments on Richard Foote's blog posting Clustering Factor in concatenated indexes :
0. Yes it is a range scan.

1. There are two indexes on the table. Both are multicolumn indexes. “index_a” is the bad index which the optimizer was choosing to use initially.

2. “index_a” resulted in this RowSourceOperation entry :
STAT #1 id=16 cnt=3354870 pid=15 pos=1 obj=14815979 op=’INDEX RANGE SCAN index_a ‘ and this for the Fetch :
FETCH #1:c=95480000,e=2366647483,p=691119,cr=1353212,cu=0,mis=0,r=1,dep=0,og=4

Thus, Oracle using this index in the Range Scan was doing 1.35million consistent gets.

3. I needed to disable usage of this index. I knew that there was a better index — index_b. I had 4 choices :
a) Drop index_a
b) Apply a function to the predicate
c) Manipulate column and index statistics
d) Put an explicit NO_HINT index
Actually, I could have tried the 5th option
e) Put an explicit HINT (table_1 index_b).
But I didn’t do that. I just wanted to see what alternate execution plan (not just index_b but the whole plan, nested loops whatever) Oracle chose withouth index_a.

4. The NO_INDEX Hint therefore, wasn’t really intended as the solution. My blog posting doesn’t assert it that way. I just said that I used a NO_INDEX Hint — the reason only was I wanted to disable usage of that index.

5. Oracle then switched to index_b. This resulted in :
STAT #1 id=17 cnt=3314718 pid=16 pos=1 obj=14815976 op=’INDEX RANGE SCAN index_b ‘
and
FETCH #1:c=10320000,e=257057504,p=78328,cr=159300,cu=0,mis=0,r=1,dep=0,og=4

6. The question, then, was :
“In the first SQL RowSourceOperations there 3.354million rowids fetched from index_a and the total count of Buffer Gets for the whole query was 1.353million.
In the second SQL’s RowSourceOperations there 3.315million rowids fetched from index_b and yet the total count of Buffer Gets for the whole query was only 159,300 !
Why the difference ?”

7. The assertion “The difference lay in the Clustering Factor” was actually an answer to that exact question. I didn’t assert outright that the changed execution plan or better performance was a result of the Clustering Factor.

8. These were the statistics :
INDEX_NAME LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY
—————————— ———– ———————–
AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR DISTINCT_KEYS
———————– —————– ————-
index_b 463706 1 13 3490380 261796
index_a 456250 1 4 21444164 4845082

9. What I tried to put forward was that the better Clustering Factor with index_b was that although Oracle did read just as many rowids in the index range scan, the number of block gets for the *table* was much lower — the total consistent gets from 1.35million to only 159,300 !

10. Obviously, the number of rows is still similar. The number of *blocks* is much lower. That is my assertion about the Clustering Factor.

11. index_b’s leading column is a date column and rows are inserted into the table by date — ascending order. Much better clustering of rows into the same block and contiguous blocks when I fetch them ordered by the date.

12. So, the Clustering Factor didn’t result in inspection of fewer index entries. It did result in a significant reduction in consistent gets. Unfortunately, the row source operation statistics don’t really show how many index blocks and how many table blocks were read but I’m pretty sure that, because it was a Range Scan, the reduction was in table blocks.


10 January, 2008

Examples Of Odd Extent Sizes In Tablespaces With AUTOALLOCATE

The "normal" AutoAllocate Extent sizes are 64K, 1M, 8M, 64M.
A table is allocated 16 extents of 64K until it reaches a size of 1MB. After that, extent sizes are 1MB each until a size of 64MB. That makes 79 extents.Then, extent sizes are of 8MB till a total size of 1GB.After reaching 1GB, the sizes are 64MB each.
These step functions are *NOT* documented by Oracle but have been observed by many DBAs.

Having said that, we sometimes we see Segments and Tablespaceswith extent sizes that are slightly different.
This can happen when Oracle finds a small "hole" (eg because other segments have been dropped/rebuilt or near the end of the datafile) and uses that "hole"precisely because AutoAllocate allows Oracle to decide ExtentSizes.
{For example : If the table had a static NEXT of 1M and there was no free extent of 1M in the datafile, only say 256K, Oracle would report an"unable to allocate extent of size 128 blocks". But with AutoAllocate,Oracle could actually create a 128K extent at that time, still leaving 128K free for another extent !}

Another case is when a table is created with a very large INITIAL, Oracle might set the Extent Sizes to 1MB from the beginning.

For other examples and discussion also see

http://www.freelists.org/archives/oracle-l/01-2005/msg00782.html

[reference to a posting on HJR's blog deleted as it is no longer available]

http://jonathanlewis.wordpress.com/2007/05/29/autoallocate-and-px/

(repeated parallel create/insert at high degrees of parallelismcan result in objects acquiring a very large number of small extents)

UPDATE 10-Apr-12 : Also see my post on Sizing OR Growing a Table in AUTOALLOCATE







Use such a query to get extent sizes at the tablespace level :

select tablespace_name, bytes/1024, count(*)
from dba_extents
where tablespace_name in
(select tablespace_name from dba_tablespaces where allocation_type = 'SYSTEM')
group by tablespace_name, bytes/1024
order by 1,2
/