19 September, 2010

Index Skip Scan

We know that an Index Skip Scan operation is when Oracle uses an Index of which the leading column is not a query predicate. For example, an index on (A,B,C) is not normally expected to be used for a query against 'B' and 'C'. Yet, Oracle may use the index if it finds that the "cost" isn't high.
How does it determine the suitability of the index ? Oracle has to know how many distinct entries for 'A' are present in the index. This information is not in the Index statistics. However, it is present in the Column statistics.

Here is a simple demonstration of Index Skip Scan.

I create an ACCOUNTS (say, Bank Accounts) table with 400thousand+ entries and an Index on ACCOUNT_TYPE, COUNTRY_CD, COUNTRY_CITY_CD. I find that Oracle can use this index for queries by COUNTRY_CD and COUNTRY_CITY_CD -- I do not (need to) have to create a separate index for COUNTRY_CD+COUNTRY_CITY_CD. Thus, I can reduce the number of indexes required. The Index has a "poor" Clustering Factor (as values for the COUNTRY_CD and COUNTRY_CITY_CD are "scattered" across the table, given the way I generated these values), yet the Index is useful.

Remember : The "costing" depends on the number of distinct ACCOUNT_TYPE values. Too many distinct value and Oracle may choose to not use the Index for the query by COUNTRY_CD and COUNTRY_CITY_CD.

First, I setup the data :

SQL> create table Accounts_Table (
2 account_id number not null,
3 country_cd number not null,
4 country_city_cd number not null,
5 account_type varchar(3),
6 account_dt date,
7 account_title varchar2(32))
8 /

Table created.

SQL>
SQL> alter table accounts_table nologging;

Table altered.

SQL>
SQL> --- create more than 400thousand accounts in 25 countrie with 60 country_city_codes and of 4 account_types
SQL> insert /*+ APPEND */ into accounts_table
2 select rownum,
3 trunc(dbms_random.value(1,26)),
4 trunc(dbms_random.value(1,61)),
5 decode(trunc(dbms_random.value(1,6)),1,'SAV',2,'CUR',3,'FDR',4,'MXD',5,'SAV',6,'SAV'),
6 sysdate-3600+(rownum/10),
7 dbms_random.string('X',30)
8 from dual
9 connect by level < 401243
10 /

401242 rows created.

SQL>

I then create Indexes and test a few queries. (At this point I haven't gathered table statistics so the queries will help my "nudge" Oracle into making the right choices for it's default METHOD_OPT behaviour of 'FOR ALL COLUMNS SIZE AUTO').

SQL> create unique index accounts_table_pk on accounts_table(account_id) nologging;

Index created.

SQL> alter table accounts_table add constraint accounts_table_pk primary key (account_id);

Table altered.

SQL> create index accounts_table_n1 on accounts_table(account_type, country_cd, country_city_cd) nologging;

Index created.

SQL>
SQL> -- deliberately run queries so that METHOD_OPTs FOR ALL COUMNS SIZE AUTO selects these three columns
SQL> select count(*) from accounts_table where country_cd = 11;

COUNT(*)
----------
15833

SQL> select count(*) from accounts_table where country_cd = 18;

COUNT(*)
----------
16169

SQL> select count(*) from accounts_table where country_cd = 11 and country_city_cd = 24;

COUNT(*)
----------
255

SQL> select count(*) from accounts_table where country_cd = 18 and country_city_cd = 32;

COUNT(*)
----------
273

SQL> select count(*) from accounts_table where account_type = 'SAV';

COUNT(*)
----------
159963

SQL> select count(*) from accounts_table where account_type = 'MXD';

COUNT(*)
----------
80236

SQL>
SQL>
SQL> exec dbms_stats.gather_table_stats('','ACCOUNTS_TABLE',estimate_percent=>100);

PL/SQL procedure successfully completed.

SQL>


Now I test a query with statistics present :

SQL> explain plan for
2 select country_cd, country_city_cd, account_id
3 from accounts_table
4 where country_cd = 11 and country_city_cd = 24
5 /

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 512713334

-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 266 | 2926 | 270 (0)| 00:00:04 |
| 1 | TABLE ACCESS BY INDEX ROWID| ACCOUNTS_TABLE | 266 | 2926 | 270 (0)| 00:00:04 |
|* 2 | INDEX SKIP SCAN | ACCOUNTS_TABLE_N1 | 266 | | 6 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("COUNTRY_CD"=11 AND "COUNTRY_CITY_CD"=24)
filter("COUNTRY_CITY_CD"=24 AND "COUNTRY_CD"=11)

15 rows selected.

SQL>

Why did Oracle choose the Index Skip Scan ?
Firstly, it expected to have to retrieve 266 rows from the table. These are actually not clustered together for the given COUNTRY_CD and COUNTRY_CITY_CD values. Therefore, the cost of the Table Access by Rowid is 264 (270 minus 6). However, a Table Access by Rowid is possible only if Oracle gets the Rowids from a suitable index. Here is where the existing Index is usable. The cost of getting those 264 Rowids is expected to be only 6 --- inspite of having to do a Skip Scan !
How did Oracle estimate a cost of 6 for the Index Skip Scan ? It factored in the knowledge that there are 4 distinct values for ACCOUNT_TYPE (the leading column of the Index). Therefore, it adjusts for having to "skip" over the Index 4 times. Oracle doesn't need to know the ACCOUNT_TYPEs. It has to "skip" to the first ACCOUNT_TYPE ('CUR') and then read the index, ignoring the leading column to find the desired COUNTRY_CD+COUNTRY_CITY_CD combination. Once it reaches the end of the entries for this combination (i.e. it finds that it is COUNTRY_CITY_CD 25 within COUNTRY_CD 11), it merely has to skip to the next ACCOUNT_TYPE ('FDR') in the Index (remember that it can lookup the values from Branches quicker !). Within this ACCOUNT_TYPE, it again has to search for the desired COUNTRY_CD+COUNTRY_CITY_CD combination. Thus, it will be searching for COUNTRY_CD=11,COUNTRY_CITY_CD=24 4 times in the Index.

The Event 10053 trace shows that the index has a poor Clustering Factor (396,788 for 401,242 rows !) :

Table Stats::
Table: ACCOUNTS_TABLE Alias: ACCOUNTS_TABLE
#Rows: 401242 #Blks: 3350 AvgRowLen: 53.00
Index Stats::
Index: ACCOUNTS_TABLE_N1 Col#: 4 2 3
LVLS: 2 #LB: 1177 #DK: 6000 LB/K: 1.00 DB/K: 66.00 CLUF: 396788.00
Index: ACCOUNTS_TABLE_PK Col#: 1
LVLS: 2 #LB: 837 #DK: 401242 LB/K: 1.00 DB/K: 1.00 CLUF: 3288.00

Column statistics on the two query predicates are correct :

Column (#2): COUNTRY_CD(NUMBER)
AvgLen: 3.00 NDV: 25 Nulls: 0 Density: 0.01973 Min: 1 Max: 25
Histogram: Freq #Bkts: 25 UncompBkts: 401242 EndPtVals: 25
Column (#3): COUNTRY_CITY_CD(NUMBER)
AvgLen: 3.00 NDV: 60 Nulls: 0 Density: 0.0081123 Min: 1 Max: 60
Histogram: Freq #Bkts: 60 UncompBkts: 401242 EndPtVals: 60
Table: ACCOUNTS_TABLE Alias: ACCOUNTS_TABLE
Card: Original: 401242 Rounded: 266 Computed: 266.16 Non Adjusted: 266.1

A FullTableScan has a cost of 909 :

Access Path: TableScan
Cost: 917.03 Resp: 917.03 Degree: 0
Cost_io: 909.00 Cost_cpu: 120492154
Resp_io: 909.00 Resp_cpu: 120492154

An IndexSkipScan has a cost of 6 :

Access Path: index (skip-scan)
SS sel: 6.6333e-04 ANDV (#skips): 4
SS io: 4.00 vs. table scan io: 909.00
Skip Scan chosen
Access Path: index (SkipScan)
Index: ACCOUNTS_TABLE_N1
resc_io: 270.00 resc_cpu: 2026919
ix_sel: 6.6333e-04 ix_sel_with_filters: 6.6333e-04
Cost: 270.14 Resp: 270.14 Degree: 1

This accounts for "Skips" in the Index (represented by "ANDV"). Of the 1,177 Leaf Blocks in the Index, Oracle expects to have to read only 6.

Best:: AccessPath: IndexRange Index: ACCOUNTS_TABLE_N1
Cost: 270.14 Degree: 1 Resp: 270.14 Card: 266.16 Bytes: 0


(When I trace an actual run of the query, I find that Oracle has to read 17 index blocks (root+branch+leaf) and 253 table blocks for a total number of 270 block gets to fetch 255 rows so Index Skip Scan costing isn't very accurate yet !). This test was on 10.2.0.4 with ASSM.


If I modify the column statistics to set NDV (Number of Distinct Values) to 659 for ACCOUNT_TYPE, Oracle switches to doing a FullTableScan with a cost of 917 ! If Oracle were to do a Skip Scan using the Index, it estimates a cost of 925 :

Access Path: index (skip-scan)
SS sel: 6.6415e-04 ANDV (#skips): 659
SS io: 659.00 vs. table scan io: 909.00
Skip Scan chosen
Access Path: index (SkipScan)
Index: ACCOUNTS_TABLE_N1
resc_io: 925.00 resc_cpu: 6691462
ix_sel: 6.6415e-04 ix_sel_with_filters: 6.6415e-04
Cost: 925.45 Resp: 925.45 Degree: 1



Therefore, an Index Skip Scan is usable if the number of distinct values for the leading column(s) ise sufficiently low that the number of "skips" in the Index does not become significant.

.
.
.

2 comments:

Ram said...

Hi Hemant
Thanks for this post. This is really helpful for me as i m undergoing the same situation.

I know i m posting this query after a long time this is posted.

you mentioned 1,177 Leaf Blocks in the Index. How did u find that.

Also you said the below
(When I trace an actual run of the query, I find that Oracle has to read 17 index blocks (root+branch+leaf) and 253
table blocks for a total number of 270 block gets to fetch 255 rows so Index Skip Scan costing isn't very accurate
yet !).

How did u trace the actual run and ifentifirs tat i read 17 index blocks?

Please advise.

Thanks
Ram

Hemant K Chitale said...

Ram,
The number of leaf blocks appears in the Index Stats for ACCOUNTS_TABLE_N1:
#LB: 1177


The number of index blocks actually read was retrieved from the trace. I don't think I have the trace file anymore.

Hemant