05 July, 2011

Multiple Channels in RMAN are not always balanced

Occasionally, we come across questions about multiple channels and parallelism in RMAN.
Although RMAN distributes the datafiles across the channels, it doesn't necessarily mean that each channel has the same I/O requirements and needs the same amount of time. One channel may be reading more data and writing a larger backup than another.


For example, in this database with 16 datafiles where data is not equally distributed across all the datafiles :

SQL> select file_id, sum(blocks) from dba_extents group by file_id order by 1;

FILE_ID SUM(BLOCKS)
---------- -----------
1 105632
2 122320
3 259880
4 18864
5 8952
13 152
14 31368
15 2912
16 2336
17 8

10 rows selected.

SQL>


I then run an RMAN backup with two channels :

RMAN> show device type;

RMAN configuration parameters for database with db_unique_name ORCL are:
CONFIGURE DEVICE TYPE DISK PARALLELISM 2 BACKUP TYPE TO BACKUPSET;

RMAN>

RMAN> backup as compressed backupset database;

Starting backup at 05_JUL_23_02_29
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=39 device type=DISK
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00003 name=/home/oracle/app/oracle/oradata/orcl/undotbs01.dbf
input datafile file number=00014 name=/addtl/oracle/oradata/orcl/hemant01.dbf
input datafile file number=00016 name=/addtl/oracle/oradata/orcl/Uniform_64KB.dbf
input datafile file number=00015 name=/addtl/oracle/oradata/orcl/UNDO.dbf
input datafile file number=00017 name=/usr/tmp/X.dbf
input datafile file number=00002 name=/home/oracle/app/oracle/oradata/orcl/sysaux01.dbf
input datafile file number=00001 name=/home/oracle/app/oracle/oradata/orcl/system01.dbf
input datafile file number=00004 name=/home/oracle/app/oracle/oradata/orcl/users01.dbf
input datafile file number=00005 name=/home/oracle/app/oracle/oradata/orcl/example01.dbf
channel ORA_DISK_1: starting piece 1 at 05_JUL_23_02_32
channel ORA_DISK_2: starting compressed full datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
input datafile file number=00006 name=/home/oracle/app/oracle/oradata/orcl/FLOW_1046101119510758.dbf
input datafile file number=00007 name=/home/oracle/app/oracle/oradata/orcl/FLOW_1146416395631714.dbf
input datafile file number=00008 name=/home/oracle/app/oracle/oradata/orcl/FLOW_1170420963682633.dbf
input datafile file number=00009 name=/home/oracle/app/oracle/oradata/orcl/FLOW_1194425963955800.dbf
input datafile file number=00010 name=/home/oracle/app/oracle/oradata/orcl/FLOW_1218408858999342.dbf
input datafile file number=00011 name=/home/oracle/app/oracle/oradata/orcl/FLOW_1242310449730067.dbf
input datafile file number=00012 name=/home/oracle/app/oracle/oradata/orcl/FLOW_1266412439758696.dbf
input datafile file number=00013 name=/home/oracle/app/oracle/oradata/orcl/APEX_1295922881855015.dbf
channel ORA_DISK_2: starting piece 1 at 05_JUL_23_02_33
channel ORA_DISK_2: finished piece 1 at 05_JUL_23_02_40
piece handle=/addtl/oracle/flash_recovery_area/ORCL/backupset/2011_07_05/o1_mf_nnndf_TAG20110705T230230_7169w9t3_.bkp tag=TAG20110705T230230 comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:00:07
channel ORA_DISK_1: finished piece 1 at 05_JUL_23_06_40
piece handle=/addtl/oracle/flash_recovery_area/ORCL/backupset/2011_07_05/o1_mf_nnndf_TAG20110705T230230_7169w8s7_.bkp tag=TAG20110705T230230 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:04:08
Finished backup at 05_JUL_23_06_40


So, Channel ORA_DISK_2 did a backup of 8 datafiles in 7seconds while Channel ORA_DISK_1 took 248 seconds to run a backup of 9 datafiles !

The next time you wonder why one RMAN channel takes a longer time than the other ..... ask yourself if the two channels have been given the same amount of "work" to do.

In my case, Channel ORA_DISK_2 had to backup data from 152 blocks only but Channel ORA_DISK_1 had to backup data from 551,104 blocks (*assuming* that there are no "empty" blocks that were formerly part of used extents but are now free extents) :

SQL> select file_id, sum(blocks) from dba_extents
2 where file_id in
3 (select file_id from dba_data_files
4 where (file_name like '%FLOW%' or file_name like '%APEX%')
5 )
6 group by file_id
7 order by file_id
8 /

FILE_ID SUM(BLOCKS)
---------- -----------
13 152
-----------
sum 152

SQL> select file_id, sum(blocks) from dba_extents
2 where file_id not in
3 (select file_id from dba_data_files
4 where (file_name like '%FLOW%' or file_name like '%APEX%')
5 )
6 group by file_id
7 order by file_id
8 /

FILE_ID SUM(BLOCKS)
---------- -----------
1 105632
2 121032
3 259880
4 18864
5 8952
14 31368
15 3032
16 2336
17 8
-----------
sum 551104

9 rows selected.

SQL>
SQL> select sum(space) from dba_recyclebin;

SUM(SPACE)
----------
0

SQL>



Question : What can you do to address this "imbalance" ?

.
.
.

11 comments:

Narendra said...

Is manual channel allocation the answer?
http://download.oracle.com/docs/cd/B19306_01/backup.102/b14191/rcmconc1.htm#i1015006

Nitin said...

Adding to Narendra's point, specifying DURATION and MINIMIZE TIME will do(?)..

Vishal Desai said...

If you are not hosting database for multiple clients you can probably move tables into DATA tablespace and indexes into INDX tablespace. I did same on one of my warehouse database (4T) and backup time went down from 26 hours to 6 hours.

You can use large tablespaces and in 11g multiple channels can backup one large datafile but I haven't tested that feature.

Hemant K Chitale said...

Narendra, Nitin, Vishal,

I shall be updating this blog post (or creating a new one) with my responses in a few days.

Hemant

Hemant K Chitale said...

Narendra,
It makes no difference whether you manually ALLOCATE CHANNEL or use Automatic Channels.
In fact the examples at the URL you provide (http://download.oracle.com/docs/cd/B19306_01/backup) show 3 datafiles backed up via 3 channels.

Furthermore, the risk with specifying Datafiles in your backup script is that, with a hard-coded list, you leave yourself vulnerable to the possibility of new datafiles not being backed up (simply because the backup script was not updated) ! A very grave risk.
(And that is why BACKUP DATABASE and BACKUP TABLESPACE are "safe" commands because they always identify the current list of datafiles to backup).

Also, if you do distribute datafiles across Channels, over time, this distribution becomes unbalanced as usage of certain datafiles (i.e. allocation of extents in these datafiles) increases --- particularly with "new" datafiles.

To address these issues, you'd have to build an intelligent "wrapper" script that generates the ALLOCATE CHANNEL and BACKUP commands dynamically, based on some rules relating to datafile usage and highwatermarks.
Not something I'd be comfortable with.

Nitin,
Using DURATION and MINIMIZE TIME will actually limit your ability to backup the database. You'd use this when Hardware Resources are inadequate to backup the whole database in adequate time.

In fact Channel Allocation also follows from the same constraint -- hardware resources cannot backup the whole database via one channel within a given backup window.

Vishal,
Your achievements most likely resulted from shrinking the usage of datafiles, reducing their highwatermarks, creating new, tightly-packed datafiles, thus reducing the total volume of blocks that RMAN has to read to backup.


Hemant

David Vega said...

I should use the FILESPERSET parameter. With this we can control how many datafiles our channels will lock for others, because each channel locks the datafiles that it will use in the current backup set.

Very rudimentary I think but effective in order to distribuite the work balance.

If you want to use 2 channels I should split the total number of blocks in 2, i.e. 276212. Then sum blocks starting from biggest datafiles until reach that number of blocks. In result, the amount of summed datafiles-1 is the appropiate parameter for FILESPERSET.

In this case, since the two biggests DF its over 276k blocks, I should set FILESPERSET = 1. That will allow one channel backpup the rest of datafiles while the other is busy with a big one.
CONS- If i'm right, You will have one backup piece for each datafile.


Anyway, I want to know your solution but i cant find it in your blog.

Regards

Hemant K Chitale said...

David,
The "cons" of 1 file per backupset isn't one to be comfortable with when you have a database with a hundred or a few hundred datafiles.
Yes, FILESPERSET is usable when I have only 20 or so datafiles.


Hemant K Chitale

Hemant K Chitale said...

David,
The "solution" if there is one, is to ensure that all datafiles are more-or-less of the same "standard size" -- e.g. every datafile is 2G or 4G or 16G, or factors or multiples thereof.

A small tablespace is limited to only one datafile while a large tablespace has many datafiles of the same size.

Obviously not a complete solution when you really have a mix of tablespace and datafile sizes.


Hemant K Chitale

Hemant K Chitale said...

However, where the datafiles are not equally loaded with data and with Unused Block Compression, it can still happen that one channel has fewer used blocks to backup than the other channel.

Balwanth said...

Is the final solution still below comment?


#####################################################################################
To address these issues, you'd have to build an intelligent "wrapper" script that generates the ALLOCATE CHANNEL and BACKUP commands dynamically, based on some rules relating to datafile usage and highwatermarks.
#####################################################################################

I have requirement in my company to strip the backups equally. Even I am not comfortable doing datafile level, even I try out I should also try out our restore scenario.


Hemant K Chitale said...

Balwanth,
Most databases, in my experience, have tablespaces and tablespace usage of different sizes. If the number of tablespaces and datafiles is not large, say a few dozen, you might still use the default behavior.
Else, I'd recommend some intelligent scripting.
Some sites simply choose to set FILESPERSET=1 which looks awful when you have hundreds of datafiles !