24 August, 2007

NULLs are not Indexed, Right ? NOT !

Update 09-Feb-08 :
See MetaLink Note#551754.1 "Query with ROWNUM Condition On a Table Which Has Index on Column "Constant" Returns No Rows" for a caveat. (Bug#6737251). The recommendation there is to NOT use a Constant in an Index.

We all know very well that NULLs are not indexed and a query for NULL values in a column cannot use the index on that column ? However, there is a way to index a column with NULL values.
I read about this method a few times on different blogs in the past week. So I decided to create my own test case.
What I do is to add a CONSTANT to the column being indexed. Since the Constant is NOT NULL but is the second column in the index, the NULLs of our actual column have to get included in the index.
This method fails on 10.2.0.1 if the constant is a "1". See the references in my example below :





SQL>
SQL> REM ===========================================================================
SQL> REM Discover the table TEST_TABLE and how many NULLs does the STATUS column have
SQL> REM -- FYI, this is a prebuilt copy of DBA_OBJECTS
SQL> REM where I have updated STATUS to NULL for 29 rows
SQL>
SQL> desc test_table;
Name Null? Type
----------------------------------------- -------- ----------------------------
OWNER VARCHAR2(30)
OBJECT_NAME VARCHAR2(128)
SUBOBJECT_NAME VARCHAR2(30)
OBJECT_ID NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(19)
CREATED DATE
LAST_DDL_TIME DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)
SQL>
SQL> column index_name format a30
SQL> column column_name format a30
SQL> column column_position format 9999 hea 'Posn'
SQL>
SQL> REM The index that is currently present is not usable in our test queries later
SQL> select index_name, column_name, column_position
2 from user_ind_columns
3 where table_name = 'TEST_TABLE'
4 order by index_name, column_position
5 /
INDEX_NAME COLUMN_NAME Posn
------------------------------ ------------------------------ -----
TEST_TBL_NDX LAST_DDL_TIME 1
Elapsed: 00:00:00.21
SQL>
SQL> select decode(status,'VALID','VALID',NULL,'Is Null'),
2 count(*)
3 from test_table
4 group by decode(status,'VALID','VALID',NULL,'Is Null')
5 /
DECODE( COUNT(*)
------- ----------
VALID 51528
Is Null 29
Elapsed: 00:00:00.17
SQL>
SQL> REM ===========================================================================
SQL> REM Get the query execution plan for queries where status is NULL
SQL> exec dbms_stats.gather_table_stats(ownname=>'HEMANT',tabname=>'TEST_TABLE',cascade=>TRUE);
Elapsed: 00:00:04.43
SQL>
SQL> set autotrace on
SQL>
SQL> select count(*) from test_table where status is NULL
2 /
COUNT(*)
----------
29
Elapsed: 00:00:00.31
Execution Plan
----------------------------------------------------------
Plan hash value: 711311523

--------------------------------------------------------------------------------
-

Id Operation Name Rows Bytes Cost (%CPU) Time


--------------------------------------------------------------------------------
-

0 SELECT STATEMENT 1 6 165 (3) 00:00:02


1 SORT AGGREGATE 1 6


* 2 TABLE ACCESS FULL TEST_TABLE 16 96 165 (3) 00:00:02


--------------------------------------------------------------------------------
-


Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("STATUS" IS NULL)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
714 consistent gets
405 physical reads
0 redo size
411 bytes sent via SQL*Net to client
380 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
SQL> REM We saw above that it was doing an FTS and 714 block gets
SQL> REM ===========================================================================
SQL>
SQL> REM ===========================================================================
SQL> REM Create an Index on status and try again
SQL> set autotrace off
SQL>
SQL> create index test_table_status_ndx1 on test_table(status)
2 /
Elapsed: 00:00:00.50
SQL>
SQL> exec dbms_stats.gather_table_stats(ownname=>'HEMANT',tabname=>'TEST_TABLE',cascade=>TRUE);
Elapsed: 00:00:03.71
SQL>
SQL> set autotrace on
SQL>
SQL> select count(*) from test_table where status is NULL
2 /
COUNT(*)
----------
29
Elapsed: 00:00:00.29
Execution Plan
----------------------------------------------------------
Plan hash value: 711311523

--------------------------------------------------------------------------------
-

Id Operation Name Rows Bytes Cost (%CPU) Time


--------------------------------------------------------------------------------
-

0 SELECT STATEMENT 1 6 165 (3) 00:00:02


1 SORT AGGREGATE 1 6


* 2 TABLE ACCESS FULL TEST_TABLE 43 258 165 (3) 00:00:02


--------------------------------------------------------------------------------
-


Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("STATUS" IS NULL)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
714 consistent gets
709 physical reads
0 redo size
411 bytes sent via SQL*Net to client
380 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
SQL> REM Still doing an FTS and 714 block gets because index test_table_status_ndx1 does
SQL> REM not capture NULLs !
SQL> REM ===========================================================================
SQL>
SQL> REM ===========================================================================
SQL> REM Create a "concatenated" index on status and try again
SQL> REM -- my concatenated index, as if an FBI, uses a constant "A"
SQL> REM why did I use "A" instead of 1 ?
SQL> REM my first few tests with "1" kept reporting ORA-600 [qkssao2s1] errors on dbms_stats
SQL> REM -- although the Execution Plan WAS working
SQL> REM If I used "2" as the constant, I got worse errors and disconnections
SQL> REM So I decided to use an Alpha character, as STATUS is VARCHAR2
SQL> REM For more information on qkssao2s1 and constant "1" see Note#5005939.8
SQL>
SQL> create index test_table_status_ndx2 on test_table(status,'A')
2 /
Elapsed: 00:00:00.48
SQL>
SQL> exec dbms_stats.gather_table_stats(ownname=>'HEMANT',tabname=>'TEST_TABLE',cascade=>TRUE);
Elapsed: 00:00:05.39
SQL>
SQL> set autotrace on
SQL>
SQL> select count(*) from test_table where status is NULL
2 /
COUNT(*)
----------
29
Elapsed: 00:00:00.64
Execution Plan
----------------------------------------------------------
Plan hash value: 180495830

--------------------------------------------------------------------------------
------------

Id Operation Name Rows Bytes Cost (%CPU)
Time

--------------------------------------------------------------------------------
------------

0 SELECT STATEMENT 1 6 2 (0)
00:00:01

1 SORT AGGREGATE 1 6


* 2 INDEX RANGE SCAN TEST_TABLE_STATUS_NDX2 45 270 2 (0)
00:00:01

--------------------------------------------------------------------------------
------------


Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("STATUS" IS NULL)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
411 bytes sent via SQL*Net to client
380 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
SQL> REM Aah ! Now we see the Index test_table_status_ndx2 being used to get NULLs
SQL> REM on status. NULLs *are* in this index because status is only the leading column
SQL> REM of the index and the succeeding column is a not null
SQL> REM The performance gain comes from having to do 2 block gets instead of 714
SQL> REM ===========================================================================
SQL> REM ===========================================================================
SQL> set autotrace off
SQL> drop index test_table_status_ndx1;
Elapsed: 00:00:00.59
SQL> drop index test_table_status_ndx2;
Elapsed: 00:00:00.31
SQL> spool off

3 comments:

henry2man said...

Simple change "1" to ' ' (whitespace) and the bug goes off.

henry2man said...

Simple change "1" to ' ' (whitespace) and the bug goes off.

Hemant K Chitale said...

Henry2man,

I used an 'A' . A whitespace is a constant, single, character -- just as an 'A' is.


Hemant