08 November, 2020

Compressing VARCHAR2 and CHAR column data

 Most applications that I see use VARCHAR2 columns.  However, there are some (including many built by "in house database designers" that use CHAR columns).  CHAR columns are for Fixed-Length character strings.  Any inserted values that are shorter than the defined length are right-padded with blanks.  Comparison semantics are also different in that VARCHAR2 uses non-padded comparison while CHAR uses blank-padding for comparison.

In this demonstration, I'd like to focus on compressibility of columns defined as VARCHAR2 and CHAR.

These scripts are executed in 19c on Linux.


First with VARCHAR2


SQL> create tablespace VC2_TBS datafile '/opt/oracle/oradata/HEMANT/VC2_TBS.dbf' size 2G;

Tablespace created.

SQL>
SQL> create table VC2_TABLE
  2  (id_col number(8), data_col_1 varchar2(50), data_col_2 varchar2(50), data_col_3 varchar2(50))
  3  pctfree 0
  4  tablespace VC2_TBS
  5  /

Table created.

SQL>
SQL> declare
  2  l_c number;
  3  begin
  4  for l_c in 1 .. 100
  5  loop
  6  insert     into VC2_TABLE
  7  select rownum, dbms_random.string('X',25), dbms_random.string('X',15), dbms_random.string('X',10)
  8  from dual
  9  connect by level "less than sign" 100001;  -- "less than sign" replaced by words to preserve HTML formatting
 10  commit;
 11  end loop;
 12  end;
 13  /

PL/SQL procedure successfully completed.

SQL>
SQL> exec dbms_stats.gather_table_stats('','VC2_TABLE',degree=>4);

PL/SQL procedure successfully completed.

SQL>
SQL> select compression, compress_for, num_rows, blocks, avg_row_len
  2  from user_tables
  3  where table_name = 'VC2_TABLE'
  4  /

COMPRESS COMPRESS_FOR                       NUM_ROWS       BLOCKS AVG_ROW_LEN
-------- ------------------------------ ------------ ------------ -----------
DISABLED                                  10,000,000       78,437          58

SQL>
SQL> select bytes/1048576
  2  from user_segments
  3  where segment_name = 'VC2_TABLE'
  4  /

BYTES/1048576
-------------
          616

SQL>
SQL> pause For RMAN BACKUP AS COMPRESSED BACKUPSET
For RMAN BACKUP AS COMPRESSED BACKUPSET

RMAN> backup as compressed backupset tablespace VC2_TBS;

Starting backup at 08-NOV-20
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=45 device type=DISK
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00004 name=/opt/oracle/oradata/HEMANT/VC2_TBS.dbf
channel ORA_DISK_1: starting piece 1 at 08-NOV-20
channel ORA_DISK_1: finished piece 1 at 08-NOV-20
piece handle=/opt/oracle/product/19c/dbhome_1/dbs/0svf1p8h_1_1 tag=TAG20201108T221753 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25
Finished backup at 08-NOV-20


oracle19c>ls -l /opt/oracle/product/19c/dbhome_1/dbs/0svf1p8h_1_1
-rw-r-----. 1 oracle oinstall 386490368 Nov  8 22:18 /opt/oracle/product/19c/dbhome_1/dbs/0svf1p8h_1_1
oracle19c>
-- approx 369MB  RMAN Compressed Backup

SQL>
SQL> alter table VC2_TABLE
  2  move compress
  3  /

Table altered.

SQL>
SQL> exec dbms_stats.gather_table_stats('','VC2_TABLE',degree=>4);

PL/SQL procedure successfully completed.

SQL>
SQL> select compression, compress_for, num_rows, blocks, avg_row_len
  2  from user_tables
  3  where table_name = 'VC2_TABLE'
  4  /

COMPRESS COMPRESS_FOR                       NUM_ROWS       BLOCKS AVG_ROW_LEN
-------- ------------------------------ ------------ ------------ -----------
ENABLED  BASIC                            10,000,000       79,027          58

SQL>
SQL> select bytes/1048576
  2  from user_segments
  3  where segment_name = 'VC2_TABLE'
  4  /

BYTES/1048576
-------------
          624

SQL>


So these are the facts for the table with 3 VARCHAR2(50) columns with random 25, 15 and 10 character-strings. 
Note : I deliberately chose dbms_random.string to ensure that I'd get very few (if any) repeatable values that are compressible themselves.

10million rows of Average Row Length of 58bytes
616MB Segment (78,437 blocks equal 613MB approx).
RMAN Compressed Size of the Datafile 369MB (approx 60% of the Segment Size)
BASIC Compression Size of the Table :  624MB  (i.e. no compression achieved because I used random strings)



Next with CHAR


SQL> create tablespace CHAR_TBS datafile '/opt/oracle/oradata/HEMANT/CHAR_TBS.dbf' size 2G;  -- subsequently resized to 4000M

Tablespace created.

SQL>
SQL> create table CHAR_TABLE
  2  (id_col number(8), data_col_1 char(50), data_col_2 char(50), data_col_3 char(50))
  3  pctfree 0
  4  tablespace CHAR_TBS
  5  /

Table created.

SQL>
SQL> declare
  2  l_c number;
  3  begin
  4  for l_c in 1 .. 100
  5  loop
  6  insert     into CHAR_TABLE
  7  select rownum, dbms_random.string('X',25), dbms_random.string('X',15), dbms_random.string('X',10)
  8  from dual
  9  connect by level "less than" 100001;  -- "less than sign" replaced by words to preserve HTML formatting
 10  commit;
 11  end loop;
 12  end;
 13  /

PL/SQL procedure successfully completed.

SQL>
SQL> exec dbms_stats.gather_table_stats('','CHAR_TABLE',degree=>4);

PL/SQL procedure successfully completed.

SQL>
SQL> select compression, compress_for, num_rows, blocks, avg_row_len
  2  from user_tables
  3  where table_name = 'CHAR_TABLE'
  4  /

COMPRESS COMPRESS_FOR                       NUM_ROWS       BLOCKS AVG_ROW_LEN
-------- ------------------------------ ------------ ------------ -----------
DISABLED                                  10,000,000      204,116         158

SQL>
SQL> select bytes/1048576
  2  from user_segments
  3  where segment_name = 'CHAR_TABLE'
  4  /

BYTES/1048576
-------------
         1600

SQL>
SQL> pause For RMAN BACKUP AS COMPRESSED BACKUPSET
For RMAN BACKUP AS COMPRESSED BACKUPSET


RMAN> backup as compressed backupset tablespace CHAR_TBS;

Starting backup at 08-NOV-20
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=38 device type=DISK
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00004 name=/opt/oracle/oradata/HEMANT/CHAR_TBS.dbf
channel ORA_DISK_1: starting piece 1 at 08-NOV-20
channel ORA_DISK_1: finished piece 1 at 08-NOV-20
piece handle=/opt/oracle/product/19c/dbhome_1/dbs/0vvf1r7f_1_1 tag=TAG20201108T225127 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:35
Finished backup at 08-NOV-20


oracle19c>ls -l /opt/oracle/product/19c/dbhome_1/dbs/0vvf1r7f_1_1
-rw-r-----. 1 oracle oinstall 441876480 Nov  8 22:51 /opt/oracle/product/19c/dbhome_1/dbs/0vvf1r7f_1_1
oracle19c>
-- appox 421MB  RMAN Compressed Backup

SQL>
SQL> alter table CHAR_TABLE
  2  move compress
  3  /

Table altered.

SQL>
SQL> exec dbms_stats.gather_table_stats('','CHAR_TABLE',degree=>4);

PL/SQL procedure successfully completed.

SQL>
SQL> select compression, compress_for, num_rows, blocks, avg_row_len
  2  from user_tables
  3  where table_name = 'CHAR_TABLE'
  4  /

COMPRESS COMPRESS_FOR                       NUM_ROWS       BLOCKS AVG_ROW_LEN
-------- ------------------------------ ------------ ------------ -----------
ENABLED  BASIC                            10,000,000      204,766         158

SQL>
SQL> select bytes/1048576
  2  from user_segments
  3  where segment_name = 'CHAR_TABLE'
  4  /

BYTES/1048576
-------------
         1600

SQL>

So these are the facts for the table with 3 CHAR(50) columns with random 25, 15 and 10 character-strings. 
 Note : I deliberately chose dbms_random.string to ensure that I'd get very few (if any) repeatable values that are compressible themselves. 

10million rows of Average Row Length of 158bytes (up from 58 bytes for the VARCHAR2 columns) 
1600MB Segment (204,116 blocks equal 1595MB approx). 
RMAN Compressed Size of the Datafile 421MB  (approx 26% of the Segment Size)
BASIC Compression Size of the Table : 1600MB (i.e. no compression achieved, because I used random strings ? -- shouldn't the right-padded blanks be compressed ?

 So, in this test comparing random strings in VARCHAR2 and CHAR, quite obviously the table with CHAR columns took up much more space.
BASIC Compression didn't achieve anything (again : because I used random strings?)

RMAN default compression ("CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ") achieved better compression with CHAR

Of course, since I used random strings and you might have actual data that is compressible and the length of the actual strings inserted into the CHAR columns may be different in your production / test table, you would see different levels of compression achieved.

The point is that compression success depends not just on the type of data but can be different whether you use BASIC compression in the tablespace or the default BASIC in RMAN.

Note :  Given RMAN's default behaviour of "unused block compression" do not expect RMAN to have to backup all the blocks in entire 2GB or 4000MB datafile  -- particularly because I specifically create the tablespace just before the test and it doesn't have any other objects (segments).


No comments: