Most applications that I see use VARCHAR2 columns. However, there are some (including many built by "in house database designers" that use CHAR columns). CHAR columns are for Fixed-Length character strings. Any inserted values that are shorter than the defined length are right-padded with blanks. Comparison semantics are also different in that VARCHAR2 uses non-padded comparison while CHAR uses blank-padding for comparison.
In this demonstration, I'd like to focus on compressibility of columns defined as VARCHAR2 and CHAR.
These scripts are executed in 19c on Linux.
First with VARCHAR2
SQL> create tablespace VC2_TBS datafile '/opt/oracle/oradata/HEMANT/VC2_TBS.dbf' size 2G; Tablespace created. SQL> SQL> create table VC2_TABLE 2 (id_col number(8), data_col_1 varchar2(50), data_col_2 varchar2(50), data_col_3 varchar2(50)) 3 pctfree 0 4 tablespace VC2_TBS 5 / Table created. SQL> SQL> declare 2 l_c number; 3 begin 4 for l_c in 1 .. 100 5 loop 6 insert into VC2_TABLE 7 select rownum, dbms_random.string('X',25), dbms_random.string('X',15), dbms_random.string('X',10) 8 from dual 9 connect by level "less than sign" 100001; -- "less than sign" replaced by words to preserve HTML formatting 10 commit; 11 end loop; 12 end; 13 / PL/SQL procedure successfully completed. SQL> SQL> exec dbms_stats.gather_table_stats('','VC2_TABLE',degree=>4); PL/SQL procedure successfully completed. SQL> SQL> select compression, compress_for, num_rows, blocks, avg_row_len 2 from user_tables 3 where table_name = 'VC2_TABLE' 4 / COMPRESS COMPRESS_FOR NUM_ROWS BLOCKS AVG_ROW_LEN -------- ------------------------------ ------------ ------------ ----------- DISABLED 10,000,000 78,437 58 SQL> SQL> select bytes/1048576 2 from user_segments 3 where segment_name = 'VC2_TABLE' 4 / BYTES/1048576 ------------- 616 SQL> SQL> pause For RMAN BACKUP AS COMPRESSED BACKUPSET For RMAN BACKUP AS COMPRESSED BACKUPSET RMAN> backup as compressed backupset tablespace VC2_TBS; Starting backup at 08-NOV-20 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=45 device type=DISK channel ORA_DISK_1: starting compressed full datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00004 name=/opt/oracle/oradata/HEMANT/VC2_TBS.dbf channel ORA_DISK_1: starting piece 1 at 08-NOV-20 channel ORA_DISK_1: finished piece 1 at 08-NOV-20 piece handle=/opt/oracle/product/19c/dbhome_1/dbs/0svf1p8h_1_1 tag=TAG20201108T221753 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25 Finished backup at 08-NOV-20 oracle19c>ls -l /opt/oracle/product/19c/dbhome_1/dbs/0svf1p8h_1_1 -rw-r-----. 1 oracle oinstall 386490368 Nov 8 22:18 /opt/oracle/product/19c/dbhome_1/dbs/0svf1p8h_1_1 oracle19c> -- approx 369MB RMAN Compressed Backup SQL> SQL> alter table VC2_TABLE 2 move compress 3 / Table altered. SQL> SQL> exec dbms_stats.gather_table_stats('','VC2_TABLE',degree=>4); PL/SQL procedure successfully completed. SQL> SQL> select compression, compress_for, num_rows, blocks, avg_row_len 2 from user_tables 3 where table_name = 'VC2_TABLE' 4 / COMPRESS COMPRESS_FOR NUM_ROWS BLOCKS AVG_ROW_LEN -------- ------------------------------ ------------ ------------ ----------- ENABLED BASIC 10,000,000 79,027 58 SQL> SQL> select bytes/1048576 2 from user_segments 3 where segment_name = 'VC2_TABLE' 4 / BYTES/1048576 ------------- 624 SQL>
So these are the facts for the table with 3 VARCHAR2(50) columns with random 25, 15 and 10 character-strings.
Note : I deliberately chose dbms_random.string to ensure that I'd get very few (if any) repeatable values that are compressible themselves.
10million rows of Average Row Length of 58bytes
616MB Segment (78,437 blocks equal 613MB approx).
RMAN Compressed Size of the Datafile 369MB (approx 60% of the Segment Size)
BASIC Compression Size of the Table : 624MB (i.e. no compression achieved because I used random strings)
Next with CHAR
SQL> create tablespace CHAR_TBS datafile '/opt/oracle/oradata/HEMANT/CHAR_TBS.dbf' size 2G; -- subsequently resized to 4000M Tablespace created. SQL> SQL> create table CHAR_TABLE 2 (id_col number(8), data_col_1 char(50), data_col_2 char(50), data_col_3 char(50)) 3 pctfree 0 4 tablespace CHAR_TBS 5 / Table created. SQL> SQL> declare 2 l_c number; 3 begin 4 for l_c in 1 .. 100 5 loop 6 insert into CHAR_TABLE 7 select rownum, dbms_random.string('X',25), dbms_random.string('X',15), dbms_random.string('X',10) 8 from dual 9 connect by level "less than" 100001; -- "less than sign" replaced by words to preserve HTML formatting 10 commit; 11 end loop; 12 end; 13 / PL/SQL procedure successfully completed. SQL> SQL> exec dbms_stats.gather_table_stats('','CHAR_TABLE',degree=>4); PL/SQL procedure successfully completed. SQL> SQL> select compression, compress_for, num_rows, blocks, avg_row_len 2 from user_tables 3 where table_name = 'CHAR_TABLE' 4 / COMPRESS COMPRESS_FOR NUM_ROWS BLOCKS AVG_ROW_LEN -------- ------------------------------ ------------ ------------ ----------- DISABLED 10,000,000 204,116 158 SQL> SQL> select bytes/1048576 2 from user_segments 3 where segment_name = 'CHAR_TABLE' 4 / BYTES/1048576 ------------- 1600 SQL> SQL> pause For RMAN BACKUP AS COMPRESSED BACKUPSET For RMAN BACKUP AS COMPRESSED BACKUPSET RMAN> backup as compressed backupset tablespace CHAR_TBS; Starting backup at 08-NOV-20 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=38 device type=DISK channel ORA_DISK_1: starting compressed full datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00004 name=/opt/oracle/oradata/HEMANT/CHAR_TBS.dbf channel ORA_DISK_1: starting piece 1 at 08-NOV-20 channel ORA_DISK_1: finished piece 1 at 08-NOV-20 piece handle=/opt/oracle/product/19c/dbhome_1/dbs/0vvf1r7f_1_1 tag=TAG20201108T225127 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:35 Finished backup at 08-NOV-20 oracle19c>ls -l /opt/oracle/product/19c/dbhome_1/dbs/0vvf1r7f_1_1 -rw-r-----. 1 oracle oinstall 441876480 Nov 8 22:51 /opt/oracle/product/19c/dbhome_1/dbs/0vvf1r7f_1_1 oracle19c> -- appox 421MB RMAN Compressed Backup SQL> SQL> alter table CHAR_TABLE 2 move compress 3 / Table altered. SQL> SQL> exec dbms_stats.gather_table_stats('','CHAR_TABLE',degree=>4); PL/SQL procedure successfully completed. SQL> SQL> select compression, compress_for, num_rows, blocks, avg_row_len 2 from user_tables 3 where table_name = 'CHAR_TABLE' 4 / COMPRESS COMPRESS_FOR NUM_ROWS BLOCKS AVG_ROW_LEN -------- ------------------------------ ------------ ------------ ----------- ENABLED BASIC 10,000,000 204,766 158 SQL> SQL> select bytes/1048576 2 from user_segments 3 where segment_name = 'CHAR_TABLE' 4 / BYTES/1048576 ------------- 1600 SQL>
So these are the facts for the table with 3 CHAR(50) columns with random 25, 15 and 10 character-strings.
Note : I deliberately chose dbms_random.string to ensure that I'd get very few (if any) repeatable values that are compressible themselves.
10million rows of Average Row Length of 158bytes (up from 58 bytes for the VARCHAR2 columns)
1600MB Segment (204,116 blocks equal 1595MB approx).
RMAN Compressed Size of the Datafile 421MB (approx 26% of the Segment Size)
BASIC Compression Size of the Table : 1600MB (i.e. no compression achieved, because I used random strings ? -- shouldn't the right-padded blanks be compressed ?)
So, in this test comparing random strings in VARCHAR2 and CHAR, quite obviously the table with CHAR columns took up much more space.
BASIC Compression didn't achieve anything (again : because I used random strings?)
RMAN default compression ("CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ") achieved better compression with CHAR
Of course, since I used random strings and you might have actual data that is compressible and the length of the actual strings inserted into the CHAR columns may be different in your production / test table, you would see different levels of compression achieved.
The point is that compression success depends not just on the type of data but can be different whether you use BASIC compression in the tablespace or the default BASIC in RMAN.
Note : Given RMAN's default behaviour of "unused block compression" do not expect RMAN to have to backup all the blocks in entire 2GB or 4000MB datafile -- particularly because I specifically create the tablespace just before the test and it doesn't have any other objects (segments).
No comments:
Post a Comment