12 June, 2022

Index Statistic NUM_ROWS excludes Rows with NULL value

 A quick demonstration of the difference between the Table Statistic NUM_ROWS and the Index Statistic NUM_ROWS



SQL> desc customers
 Name                                                                     Null?    Type
 ------------------------------------------------------------------------ -------- -------------------------------------------------
 CUST_ID                                                                  NOT NULL NUMBER
 CUST_NAME                                                                         VARCHAR2(128)
 CUST_CITY                                                                         VARCHAR2(128)
 CUST_START_DATE                                                                   DATE

SQL> select index_name,  column_position, column_name
  2  from user_ind_columns
  3  where table_name = 'CUSTOMERS'
  4  order by 1,2
  5  /

INDEX_NAME       COLUMN_POSITION COLUMN_NAME
---------------- --------------- ----------------
CUST_CITY_NDX                  1 CUST_CITY
CUST_ID_NDX                    1 CUST_ID

SQL>
SQL> select count(*) from customers;

  COUNT(*)
----------
   1299647

SQL>
SQL> exec dbms_stats.gather_table_stats('','CUSTOMERS');

PL/SQL procedure successfully completed.

SQL>
SQL> select last_analyzed, sample_size, num_rows
  2  from user_tables
  3  where table_name = 'CUSTOMERS'
  4  /

LAST_ANAL SAMPLE_SIZE   NUM_ROWS
--------- ----------- ----------
12-JUN-22     1299647    1299647

SQL>
SQL> select index_name, last_analyzed, sample_size, num_rows
  2  from user_indexes
  3  where table_name = 'CUSTOMERS'
  4  order by index_name
  5  /

INDEX_NAME       LAST_ANAL SAMPLE_SIZE   NUM_ROWS
---------------- --------- ----------- ----------
CUST_CITY_NDX    12-JUN-22     1179648    1179648
CUST_ID_NDX      12-JUN-22     1299647    1299647

SQL>
SQL> select count(*)
  2  from customers
  3  where cust_city is null
  4  /

  COUNT(*)
----------
    119999

SQL>
SQL> select 1299647-1179648 from dual;

1299647-1179648
---------------
         119999

SQL>
SQL> create index cust_id_city_ndx
  2  on customers(cust_id, cust_city)
  3  /

Index created.

SQL>
SQL> select index_name, last_analyzed, sample_size, num_rows
  2  from user_indexes
  3  where table_name = 'CUSTOMERS'
  4  order by index_name
  5  /

INDEX_NAME       LAST_ANAL SAMPLE_SIZE   NUM_ROWS
---------------- --------- ----------- ----------
CUST_CITY_NDX    12-JUN-22     1179648    1179648
CUST_ID_CITY_NDX 12-JUN-22     1299647    1299647
CUST_ID_NDX      12-JUN-22     1299647    1299647

SQL>
SQL> drop index cust_id_city_ndx;

Index dropped.

SQL>
SQL> create index cust_city_id_ndx
  2  on customers(cust_city, cust_id)
  3  /

Index created.

SQL>
SQL> select index_name, last_analyzed, sample_size, num_rows
  2  from user_indexes
  3  where table_name = 'CUSTOMERS'
  4  order by index_name
  5  /

INDEX_NAME       LAST_ANAL SAMPLE_SIZE   NUM_ROWS
---------------- --------- ----------- ----------
CUST_CITY_ID_NDX 12-JUN-22     1299647    1299647
CUST_CITY_NDX    12-JUN-22     1179648    1179648
CUST_ID_NDX      12-JUN-22     1299647    1299647

SQL>


In recent versions, a CREATE INDEX implicitly includes a Gather Stats call on the new Index by default, so the two new Indexes also had updated statistics.

The CUST_CITY column has 119999 rows with NULLs.  So, the Statistics on the Index CUST_CITY_NDX on this column did not include these new rows.

The two new Indexes that I created (CUST_ID_CITY_NDX and CUST_CITY_ID_NDX) were composite indexes where at least one column (CUST_ID - which is a Primary Key) is a NOT NULL.
Therefore, Statistics on these Indexes did include all the rows as, for every row in the Table, at least 1 column had a Not NULL value.

No comments: