Search My Oracle Blog

Custom Search

27 March, 2014

Storing Trailing NULLs in a table

Oracle has an optimization in that if, for a row, all trailing columns are NULL, it does not have to store the NULL in each column, but saves space by reducing the physical size of the row in the block (in effect "compressing the NULLs").  If, however, the NULLs appear in intermediate columns between the first column and the last column (with one or more non-NULL values intervening or at the end), it has to write one byte for each NULL, without any "compression".

Here is a simple demo with two tables holding 9 data columns each.  In 90% of the rows, 6 columns have NULL values.

In the first table, an intermediate column and the last column have non-NULL value, thus resulting in the columns having NULLs  interspersed with non-NULLs.

SQL> drop table intermediate_nulls;

Table dropped.

SQL> 
SQL> create table intermediate_nulls
  2  (
  3  id_column number,
  4  data_col_1 varchar2(25),
  5  data_col_2 varchar2(25),
  6  data_col_3 varchar2(25),
  7  data_col_4 varchar2(25),
  8  data_col_5 varchar2(25),
  9  data_col_6 varchar2(25),
 10  data_col_7 varchar2(25),
 11  data_col_8 varchar2(25),
 12  data_col_9 varchar2(25)
 13  )
 14  /

Table created.

SQL> 
SQL> REM  Insert 1 million rows
SQL> REM  9 in 10 rows have NULLs for cols 2 to 5, 7 to 8  but all have values in cols 1,6,9
SQL> insert into intermediate_nulls
  2  select rownum,
  3  'Col 1 Data',
  4  decode(mod(rownum,10),0,'Col 2 Data', null),
  5  decode(mod(rownum,10),0,'Col 3 Data', null),
  6  decode(mod(rownum,10),0,'Col 4 Data', null),
  7  decode(mod(rownum,10),0,'Col 5 Data', null),
  8  'Col 6 Data',
  9  decode(mod(rownum,10),0,'Col 7 Data', null),
 10  decode(mod(rownum,10),0,'Col 8 Data', null),
 11  'Col 9 data'
 12  from dual
 13  connect by level < 1000001
 14  /

1000000 rows created.

SQL> commit;

Commit complete.

SQL> 
SQL> exec dbms_stats.gather_table_stats('','INTERMEDIATE_NULLS');

PL/SQL procedure successfully completed.

SQL> 
SQL> select num_rows, avg_row_len, blocks
  2  from user_tables
  3  where table_name = 'INTERMEDIATE_NULLS'
  4  /

  NUM_ROWS AVG_ROW_LEN     BLOCKS
---------- ----------- ----------
   1000000          50       7678

SQL> 

In the second table, all 6 trailing columns have NULLs.

SQL> drop table trailing_nulls;

Table dropped.

SQL> 
SQL> create table trailing_nulls
  2  (
  3  id_column number,
  4  data_col_1 varchar2(25),
  5  data_col_2 varchar2(25),
  6  data_col_3 varchar2(25),
  7  data_col_4 varchar2(25),
  8  data_col_5 varchar2(25),
  9  data_col_6 varchar2(25),
 10  data_col_7 varchar2(25),
 11  data_col_8 varchar2(25),
 12  data_col_9 varchar2(25)
 13  )
 14  /

Table created.

SQL> 
SQL> 
SQL> REM  Insert 1 million rows
SQL> REM  9 in 10 rows have NULLs for cols 4 to 9 -- i.e. all trailing cols
SQL> insert into trailing_nulls
  2  select rownum,
  3  'Col 1 Data',
  4  'Col 2 Data',
  5  'Col 3 Data',
  6  decode(mod(rownum,10),0,'Col 4 Data', null),
  7  decode(mod(rownum,10),0,'Col 5 Data', null),
  8  decode(mod(rownum,10),0,'Col 6 Data', null),
  9  decode(mod(rownum,10),0,'Col 7 Data', null),
 10  decode(mod(rownum,10),0,'Col 8 Data', null),
 11  decode(mod(rownum,10),0,'Col 9 Data', null)
 12  from dual
 13  connect by level < 1000001
 14  /

1000000 rows created.

SQL> commit;

Commit complete.

SQL> 
SQL> exec dbms_stats.gather_table_stats('','TRAILING_NULLS');

PL/SQL procedure successfully completed.

SQL> 
SQL> select num_rows, avg_row_len, blocks
  2  from user_tables
  3  where table_name = 'TRAILING_NULLS'
  4  /

  NUM_ROWS AVG_ROW_LEN     BLOCKS
---------- ----------- ----------
   1000000          50       6922

SQL> 

We can see that the TRAILING_NULLS table consumes approximately 10% less disk space.
Note : The AVG_ROW_LEN is the small.  When the data is queried, NULLs (one byte each) are returned to the client.  It is in the data block where the "compression" is done.
.
.
.

23 March, 2014

Plan HASH_VALUE remains the same for the same Execution Plan, even if ROWS and COST change

Here is a simple demo that shows that the Plan Hash_Value does not consider the ROWS and COST but only the Execution Plan.  Thus, even with more rows added into a table, if the Execution Plan remains the same for a query, it is independent of the number of ROWS and the COST changing.

SQL> -- create the table
SQL> create table branch_list
  2  (country_code varchar2(3), branch_code number, branch_city varchar2(50));

Table created.

SQL> 
SQL> -- create an index
SQL> create index branch_list_cntry_ndx
  2  on branch_list(country_code);

Index created.

SQL> 
SQL> 
SQL> 
SQL> -- populate it with 100 rows, one third being 'IN'
SQL> insert into branch_list
  2  select decode(mod(rownum,3),0,'IN',1,'US',2,'US'), rownum, dbms_random.string('X',32)
  3  from dual
  4  connect by level < 101
  5  /

100 rows created.

SQL> 
SQL> -- gather statistics
SQL> exec dbms_stats.gather_table_stats('','BRANCH_LIST');

PL/SQL procedure successfully completed.

SQL> 
SQL> -- get an execution plan
SQL> explain plan for
  2  select branch_code, branch_city
  3  from branch_list
  4  where country_code = 'IN'
  5  /

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 513528032

-----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                       |    50 |  1950 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BRANCH_LIST           |    50 |  1950 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BRANCH_LIST_CNTRY_NDX |    50 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("COUNTRY_CODE"='IN')

14 rows selected.

SQL> 
SQL> 
SQL> -- add another 400 rows, none of them being 'IN'
SQL> insert into branch_list
  2  select decode(mod(rownum,6),0,'SG',1,'US',2,'US',3,'US',4,'AU',5,'UK'), rownum+100, dbms_random.string('X',32)
  3  from dual
  4  connect by level < 401
  5  /

400 rows created.

SQL> 
SQL> -- update statistics
SQL> exec dbms_stats.gather_table_stats('','BRANCH_LIST');

PL/SQL procedure successfully completed.

SQL> 
SQL> -- get the execution plan again
SQL> explain plan for
  2  select branch_code, branch_city
  3  from branch_list
  4  where country_code = 'IN'
  5  /

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 513528032

-----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                       |    33 |  1320 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BRANCH_LIST           |    33 |  1320 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BRANCH_LIST_CNTRY_NDX |    33 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("COUNTRY_CODE"='IN')

14 rows selected.

SQL> 
SQL> select column_name, histogram
  2  from user_tab_columns
  3  where table_name = 'BRANCH_LIST';

COLUMN_NAME                    HISTOGRAM
------------------------------ ---------------
COUNTRY_CODE                   FREQUENCY
BRANCH_CODE                    NONE
BRANCH_CITY                    NONE

SQL> select count(*) from user_tab_histograms
  2  where table_name = 'BRANCH_LIST'
  3  and column_name = 'COUNTRY_CODE'
  4  /

  COUNT(*)
----------
         5

SQL> 


After the addition of 400 rows to a 100 row table, the distribution of rows has changed. At the second Gather_Table_Stats call, Oracle has properly omputed a Frequency Histogram on the COUNTRY_CODE column for the 5 countries ('IN','US','SG','AU','UK').  The estimate for the COUNTRY_CODE='IN' is now more accurate.

However, what I want to demonstrate here is that although "ROWS" (and "BYTES for that many ROWS) and "COST" have changed in the new Execution Plan, the PLAN HASH VALUE ("513528032") remains the same.  Thus, the PLAN HASH VALUE is independent of changes to the ROWS/BYTES and COST.  The Execution Plan, per se, hasn't changed.
.
.
.



22 March, 2014

BOGOF from Packt

Packt Publishing is celebrating the publishing of their 2000th book with a Buy One Get One Free offer.
.
.
.

Aggregated by orafaq.com

Aggregated by orafaq.com
This blog is being aggregated by orafaq.com