24 January, 2008

Is RAID 5 bad ? Always bad ?

Do DBAs always have to say "RAID 5 is bad" ? Should that be how we start talking of database performance ? Isn't that assertion slightly judgmental ?

So, here I am going to stick my neck out and possibly lose all credibility with the BAARF forum....

But just to preface this : I do accept the contention that in terms of safety and recoverability of data {and performance during recovery}, RAID-10 beats RAID-5.
[See Art S. Kagel's and Cary Millsap's papers at BAARF]. Yes, I agree.

What I disagree is the "jumping the gun" statements that RAID-5 is bad for database performance.

On a recent MetaLink forum {which started with the question "How RAC helps if there is I/O contention"}, I made a few off-the-cuff observations {not about RAC but about I/O and the assertion "RAID 5 is bad" } :

1. What I've seen from "real world" experiences is the "huge batch processing" jobs also do a lot of SELECTs (whether as part of INSERT .. SELECT or CTAS or pure data validation) which can be complex (transformations are attempted in the INSERT .. SELECT or CTAS !!) joins as well AND/OR use a numberof "temporary" tables [created as permanent tables] as staging tables repeatedly.
Another culprit can be the ETL tool or middleware which attempts Transformations on complex data sets using lots of CPU and "scratch disks", all of it outside of Oracle.

First validate if I/O is really the constraint.
Check what the I/O wait times are.
Check Oracle ResponseTime components {either in StatsPack OR much preferably in extended tracing on one run of the batch job}

and identify where the bottleneck is.

Just as it is too easy to say "I/O is slow" , so it is to say "RAID-5 is bad" without actually determining if Write I/O or QueryProcessing (huge numbers of blocks being read, and transformations being attempted).

2. What if the ResponseTime Profile was :
Reads : 35%
CPU : 50%
Writes : 15%
Moving from RAID-5 to RAID-10 may speed up your writes by 200%.
What is the total performance improvement ? 10%.

"RAID-5 is bad" is like "Microsoft is Evil".
Some people do choose to live with Microsoft.
Others find MS Windows much better than Linux.
Others don't have a choice.
To another group, it really doesn't make a difference.
Now, if you had specific code that doesn't work well on Microsoft, then you would say "Microsoft is Evil".

If your application is 90% CPU bound, speeding up your I/O does not get you a pat on the back from your users.
If your application does 1000s of loops between the Client and the Server,your I/O subsystem isn't the issue.
If your design uses an ETL tool which is outside of the database, your Database server I/O may not make a difference.

3. We're not very sure here what sort of I/O contention there is.
We do not have facts to go on for this case so we are all generalising.

In the same vein I have been generalising that if the jobs are actually spending more time on QueryProcessing and CPU or away from the DBServer--- anything other than WRITEs -- then RAID-5 wouldn't be an issue.

A database where Queries are very heavy and most of the response time is "I/O contention" but turns out to be on 'db file scattered read' and 'db file sequential read' wouldn't suffer if it was using RAID-5.


Anonymous said...

I notice ckpt spending plenty of time on control file parallel writes ( 9 ms avg ), lgwr spending plenty of time on log file parallel write ( 3 ms avg ). We have lots of problems with control file enqueues and ora-00600 [2103], especially during times of peak redo rate and complexity. It's not unusual to see ckpt blocking other sessions on cf enqueue.

I wonder if a 200% speedup of checkpoints and log file writes and archiving would mitigate those problems?

Hemant K Chitale said...

You probably have a combination of a large controlfile, very many datafiles, small redo logs and a high rate of log switches.

I don't know how you will *cause* a "200% speedup of checkpoints and log file writes and archiving".
Are you thinking of SSDs ?
Do you really know if that will improve performance.

What is the root cause of the frequent CF enqueues ?
A checkpoint occuring at a log switching and taking very long because of very many data files. The log switch happening very frequently because the redo logs are small.

Anonymous said...

That's a pretty accurate assessment you made. I was wondering if some of the congestion is simply because we use RAID 5 - writes are slow? The price we pay for cheap storage? Not that I want to try to fix the problem with hardware. Anyway I do think our log files are too small, but my management likes 'em small. I'm building an argument, working on convincing them to use bigger log files to reduce the number of log file switches. I read some 11g documentation recently that suggests a) setting log_archive_max_processes higher for systems that undergo bulk data loads and b) minimizing checkpointing by tweaking fast_start_mttr target and some other related parameters. I figure it's good for 10g, too.

Hemant K Chitale said...

When you change to larger redo logs, the "benefit" is from the reduced frequency of checkpoints (as each log file switch calls a checkpoint). Therefore, it isn't the *size* of the file that matters but the frequency of checkpoints and the number of dirty buffers to be written to disk.
If you have a large portion of your data being loaded via direct path, then those would be bypassing the buffer cache for inserts into tables (but indexes would still be updated via the buffer cache). Then, theoretically, you have fewer "dirty" buffers to checkpoint.
Another factor is that the log file must be completely archived out before it can be overwritten. Thus, if the archiver isn't fast enough your log writer would be "waiting" for the file.
Adding archiver processes to work in parallel can help. So can increasing the number of redo log files.
fast_start_mttr_target would introduce some checkpoint so it is a trade off. It introduces the impact of "trickle writes" only so that you can speed up instance startup. Calculate how many times you'd need to startup the instance from a crash in a year but also factor in the cost of the additional seconds in startup time if this is not enabled.
On the whole, I'm not impresssed with this parameter.

Anonymous said...

Thanks for the analysis. Went to bigger log files in one instance - we use physical standby so we can get away with it - applied the big dml workloads to that instance. Then mistakenly set archive_lag_target to 1800 - while leaving fast_start_mttr_target set to 300. The result? Huge gc cr waits for any and all workloads - lgwr spent tons of time on CF enqueues - a performance nightmare. Changed archive_lag_target back to zero, instance performance increase - high dml jobs 6x faster,
don't have a metric yet for reports. So, beware archive_lag_target, or at least unintelligent application of same.