27 March, 2014

Storing Trailing NULLs in a table

Oracle has an optimization in that if, for a row, all trailing columns are NULL, it does not have to store the NULL in each column, but saves space by reducing the physical size of the row in the block (in effect "compressing the NULLs").  If, however, the NULLs appear in intermediate columns between the first column and the last column (with one or more non-NULL values intervening or at the end), it has to write one byte for each NULL, without any "compression".

Here is a simple demo with two tables holding 9 data columns each.  In 90% of the rows, 6 columns have NULL values.

In the first table, an intermediate column and the last column have non-NULL value, thus resulting in the columns having NULLs  interspersed with non-NULLs.

SQL> drop table intermediate_nulls;

Table dropped.

SQL> 
SQL> create table intermediate_nulls
  2  (
  3  id_column number,
  4  data_col_1 varchar2(25),
  5  data_col_2 varchar2(25),
  6  data_col_3 varchar2(25),
  7  data_col_4 varchar2(25),
  8  data_col_5 varchar2(25),
  9  data_col_6 varchar2(25),
 10  data_col_7 varchar2(25),
 11  data_col_8 varchar2(25),
 12  data_col_9 varchar2(25)
 13  )
 14  /

Table created.

SQL> 
SQL> REM  Insert 1 million rows
SQL> REM  9 in 10 rows have NULLs for cols 2 to 5, 7 to 8  but all have values in cols 1,6,9
SQL> insert into intermediate_nulls
  2  select rownum,
  3  'Col 1 Data',
  4  decode(mod(rownum,10),0,'Col 2 Data', null),
  5  decode(mod(rownum,10),0,'Col 3 Data', null),
  6  decode(mod(rownum,10),0,'Col 4 Data', null),
  7  decode(mod(rownum,10),0,'Col 5 Data', null),
  8  'Col 6 Data',
  9  decode(mod(rownum,10),0,'Col 7 Data', null),
 10  decode(mod(rownum,10),0,'Col 8 Data', null),
 11  'Col 9 data'
 12  from dual
 13  connect by level < 1000001
 14  /

1000000 rows created.

SQL> commit;

Commit complete.

SQL> 
SQL> exec dbms_stats.gather_table_stats('','INTERMEDIATE_NULLS');

PL/SQL procedure successfully completed.

SQL> 
SQL> select num_rows, avg_row_len, blocks
  2  from user_tables
  3  where table_name = 'INTERMEDIATE_NULLS'
  4  /

  NUM_ROWS AVG_ROW_LEN     BLOCKS
---------- ----------- ----------
   1000000          50       7678

SQL> 

In the second table, all 6 trailing columns have NULLs.

SQL> drop table trailing_nulls;

Table dropped.

SQL> 
SQL> create table trailing_nulls
  2  (
  3  id_column number,
  4  data_col_1 varchar2(25),
  5  data_col_2 varchar2(25),
  6  data_col_3 varchar2(25),
  7  data_col_4 varchar2(25),
  8  data_col_5 varchar2(25),
  9  data_col_6 varchar2(25),
 10  data_col_7 varchar2(25),
 11  data_col_8 varchar2(25),
 12  data_col_9 varchar2(25)
 13  )
 14  /

Table created.

SQL> 
SQL> 
SQL> REM  Insert 1 million rows
SQL> REM  9 in 10 rows have NULLs for cols 4 to 9 -- i.e. all trailing cols
SQL> insert into trailing_nulls
  2  select rownum,
  3  'Col 1 Data',
  4  'Col 2 Data',
  5  'Col 3 Data',
  6  decode(mod(rownum,10),0,'Col 4 Data', null),
  7  decode(mod(rownum,10),0,'Col 5 Data', null),
  8  decode(mod(rownum,10),0,'Col 6 Data', null),
  9  decode(mod(rownum,10),0,'Col 7 Data', null),
 10  decode(mod(rownum,10),0,'Col 8 Data', null),
 11  decode(mod(rownum,10),0,'Col 9 Data', null)
 12  from dual
 13  connect by level < 1000001
 14  /

1000000 rows created.

SQL> commit;

Commit complete.

SQL> 
SQL> exec dbms_stats.gather_table_stats('','TRAILING_NULLS');

PL/SQL procedure successfully completed.

SQL> 
SQL> select num_rows, avg_row_len, blocks
  2  from user_tables
  3  where table_name = 'TRAILING_NULLS'
  4  /

  NUM_ROWS AVG_ROW_LEN     BLOCKS
---------- ----------- ----------
   1000000          50       6922

SQL> 

We can see that the TRAILING_NULLS table consumes approximately 10% less disk space.
Note : The AVG_ROW_LEN is the small.  When the data is queried, NULLs (one byte each) are returned to the client.  It is in the data block where the "compression" is done.
.
.
.

No comments: