22 March, 2011

OuterJoin with Filter Predicate

I recently came across a query where an outer join was being used. Such an OuterJoin allows us to create a "dummy" row for the columns from a table where the join fails so that we can still present the values from the other table in the join.
However, the query was subsequently modified by the developer to add a filter predicate on the table that the OuterJoin was defined on.
Remember that an OuterJoin returns columns with NULL values for the table that is Outer Joined.
If you add a simple filter for this table, the OuterJoin is negated -- because the rows from the OuterJoin, returning NULLs, would fail the filter on the table !


In this example, based on the well-known EMP and DEPT tables, I first show an OuterJoin where "dummy" emp rows are created (and presented with E.DEPTNO as 0). EMP has no rows for DEPTNOs 40 and 50, so these are returned as 0s.
Then, a filter on a column that appears neither in the WHERE clause nor in the SELECT clause negates the Outer Join !


SQL> -- create the tables
SQL> drop table emp;

Table dropped.

SQL> drop table dept;

Table dropped.

SQL> create table dept
2 as select * from scott.dept;

Table created.

SQL> create table emp
2 as select * from scott.emp;

Table created.

SQL>
SQL> -- list the DEPT values
SQL> select deptno from dept;

DEPTNO
----------
10
20
30
40

4 rows selected.

SQL> select distinct(deptno) from emp;

DEPTNO
----------
30
20
10

3 rows selected.

SQL>
SQL> -- create a dept
SQL> insert into dept values (50,'HQ','SINGAPORE');

1 row created.

SQL> commit;

Commit complete.

SQL>
SQL> -- OuterJoin query
SQL> -- creates "dummy" rows from EMP
SQL> -- for DEPTIDs that exist in DEPT but do not exist in EMP
SQL> select d.deptno D_Deptno, d.loc, d.dname, nvl(e.deptno,0) E_Deptno, e.empno, e.ename
2 from emp e, dept d
3 where e.deptno(+) = d.deptno
4 order by 1
5 /

D_DEPTNO LOC DNAME E_DEPTNO EMPNO ENAME
---------- ------------- -------------- ---------- ---------- ----------
10 NEW YORK ACCOUNTING 10 7934 MILLER
10 NEW YORK ACCOUNTING 10 7839 KING
10 NEW YORK ACCOUNTING 10 7782 CLARK
20 DALLAS RESEARCH 20 7902 FORD
20 DALLAS RESEARCH 20 7876 ADAMS
20 DALLAS RESEARCH 20 7566 JONES
20 DALLAS RESEARCH 20 7369 SMITH
20 DALLAS RESEARCH 20 7788 SCOTT
30 CHICAGO SALES 30 7900 JAMES
30 CHICAGO SALES 30 7844 TURNER
30 CHICAGO SALES 30 7698 BLAKE
30 CHICAGO SALES 30 7654 MARTIN
30 CHICAGO SALES 30 7521 WARD
30 CHICAGO SALES 30 7499 ALLEN
40 BOSTON OPERATIONS 0
50 SINGAPORE HQ 0

16 rows selected.

SQL>
SQL> -- OuterJoin query with filter
SQL> -- what happens if the query is modified to add a filter
SQL> -- even if the filter is not on the join column ?
SQL> select d.deptno D_Deptno, d.loc, d.dname, nvl(e.deptno,0) E_Deptno, e.empno, e.ename
2 from emp e, dept d
3 where e.deptno(+) = d.deptno
4 -- an additional filter for job='CLERK' has been added
5 -- this column is not in the Join and is not in the SELECT
6 and e.job = 'CLERK'
7 order by 1
8 /

D_DEPTNO LOC DNAME E_DEPTNO EMPNO ENAME
---------- ------------- -------------- ---------- ---------- ----------
10 NEW YORK ACCOUNTING 10 7934 MILLER
20 DALLAS RESEARCH 20 7876 ADAMS
20 DALLAS RESEARCH 20 7369 SMITH
30 CHICAGO SALES 30 7900 JAMES

4 rows selected.

SQL>
SQL> -- Which are the CLERKs ?
SQL> select e.deptno, e.empno, e.ename
2 from emp e
3 where e.job = 'CLERK'
4 order by 1
5 /

DEPTNO EMPNO ENAME
---------- ---------- ----------
10 7934 MILLER
20 7369 SMITH
20 7876 ADAMS
30 7900 JAMES

4 rows selected.

SQL>


Note how the addition of a filter "e.job = 'CLERK'" changed the output of the EMP - DEPT join query from 16 rows to 4 rows. Even the two "dummy" rows for DEPTNOs 40 and 50 are now excluded.

Thus, if you need to use an OuterJoin, check what filter predicates you are specifying on the table that is OuterJoined !


UPDATE : As Rob and "orcl" explain, applying an "OuterJoin" on the additional filter column will allow retrieval of DEPTNOs 40 and 50 from DEPT, although this filter will exclude all other employees from EMP !


SQL> -- OuterJoin query with filter
SQL> -- specify an OuterJoin on the filter column as well !
SQL> select d.deptno D_Deptno, d.loc, d.dname, nvl(e.deptno,0) E_Deptno, e.empno, e.ename
2 from emp e, dept d
3 where e.deptno(+) = d.deptno
4 -- an additional filter for job='CLERK' has been added
5 -- this column is not in the Join and is not in the SELECT
6 -- and e.job = 'CLERK'
7 and e.job(+) = 'CLERK'
8 order by 1
9 /

D_DEPTNO LOC DNAME E_DEPTNO EMPNO ENAME
---------- ------------- -------------- ---------- ---------- ----------
10 NEW YORK ACCOUNTING 10 7934 MILLER
20 DALLAS RESEARCH 20 7369 SMITH
20 DALLAS RESEARCH 20 7876 ADAMS
30 CHICAGO SALES 30 7900 JAMES
40 BOSTON OPERATIONS 0
50 SINGAPORE HQ 0

6 rows selected.

SQL>


So, while "e.deptno(+) = d.deptno" is really an OuterJoin, we are simulating an OuterJoin in "e.job(+) = 'CLERK'" to ensure that we create dummy rows in EMP. Strange ?! We don't get dummy rows for jobs 'SALESMAN', 'MANAGER' etc which do not "join" to 'CLERK' although we do get dummy rows where the join to "d.deptno" would not be successful.


.
.
.

18 March, 2011

I/O for OutOfLine LOBs

When you define a LOB (CLOB or BLOB) as a column for a table, you have the choice of specifying whether it is Inline (stored with the row) or OutOfLine (stored in a separate segment). (Note : An Inline LOB still has a LOB Segment because an entry exceeding 4000bytes will automatically be stored OutOfLine).
The data in the OutOfLine LOB, being stored in a separate segment (which is of type 'LOBSEGMENT') is accessed via an Index (which is of type 'LOBINDEX').

It is easy to map a LOB Segment to the actual Table Column. The Segment Name contains the ObjectID and ColumnID. Alternatively, you can query USER_LOBs.

OutOfLine LOBs are read and written in CHUNK sizes. Typically the CHUNK size is one datablock, although it can be defined to be a large multiple of the datablock size. Therefore, if you store less than a datablock sized contents in a LOB, the I/O is still executed as a datablock. Thus, in an 8KB tablespace, the I/O is one 8KB datablock even if you store only a few tens of bytes or hundreds of bytes in a deliberately defined OutOfLine LOB.

Here is a simple demonstration of LOB Storage and I/O. I start with a table that has 2 OutOfLine LOBs and one Inline LOB (remember : data exceeding 4000 bytes in Inline LOB is automatically stored OutOfLine so Oracle precreates a LOBSEGMENT even for the Inline LOB).


SQL> drop table table_with_LOBs;

Table dropped.

SQL>
SQL> create table table_with_LOBs
2 (id_column number not null primary key,
3 ool_lob_1 clob,
4 ool_lob_2 clob,
5 inline_lob clob,
6 date_col date,
7 varchar_col varchar2(50))
8 lob (ool_lob_1)
9 store as (disable storage in row tablespace Uniform_64KB chunk 8192 nocache logging),
10 lob (ool_lob_2)
11 store as (disable storage in row tablespace Uniform_64KB chunk 8192 nocache logging),
12 lob (inline_lob)
13 store as (enable storage in row)
14 /

Table created.

SQL>
SQL>
SQL> insert into table_with_LOBs
2 select rownum, -- id_column
3 dbms_random.string('A',10), -- ool_lob_1 at 10 characters per row
4 dbms_random.string('A',1000), -- ool_lob_2 at 1000 characters per row
5 dbms_random.string('A',1500), -- inline_lob
6 sysdate,
7 dbms_random.string('A',20) -- varchar
8 from dual
9 connect by level < 1001
10 /

1000 rows created.

SQL>
SQL> -- measure the writes
SQL> -- notice how each LOB column had a write call (3 x 1000)
SQL> -- also the data written is less than the CHUNKSIZE so it is 'unaligned'
SQL> select n.name, s.value
2 from v$statname n, v$mystat s
3 where n.statistic#=s.statistic#
4 and n.name in ('lob reads','lob writes', 'lob writes unaligned')
5 /

NAME VALUE
---------------------------------------------------------------- ----------
lob reads 0
lob writes 3000
lob writes unaligned 3000

SQL> commit;

Commit complete.

SQL>
SQL> -- compute statisics
SQL> exec dbms_stats.gather_table_stats('','TABLE_WITH_LOBS',estimate_percent=>100,cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL>
SQL> -- get column lengths
SQL> select column_name, data_type, data_length, avg_col_len
2 from user_tab_columns
3 where table_name = 'TABLE_WITH_LOBS'
4 order by column_id
5 /

COLUMN_NAME DATA_TYPE DATA_LENGTH AVG_COL_LEN
------------------------------ ---------- ----------- -----------
ID_COLUMN NUMBER 22 4
OOL_LOB_1 CLOB 476 87
OOL_LOB_2 CLOB 476 87
INLINE_LOB CLOB 4000 3103
DATE_COL DATE 7 8
VARCHAR_COL VARCHAR2 50 21

6 rows selected.

SQL>
SQL> -- get Table size
SQL> select blocks, num_rows, avg_row_len
2 from user_tables
3 where table_name = 'TABLE_WITH_LOBS'
4 /

BLOCKS NUM_ROWS AVG_ROW_LEN
---------- ---------- -----------
622 1000 3310

SQL> select blocks, bytes/1048576 Size_MB
2 from user_segments
3 where segment_type = 'TABLE'
4 and segment_name = 'TABLE_WITH_LOBS'
5 /

BLOCKS SIZE_MB
---------- ----------
640 5

SQL>
SQL> -- get LOBSegment sizes -- note how the OutOfLine LOBs are 1 8KB block per row
SQL> select l.column_name, l.segment_name, s.blocks, s.bytes/1048576 Size_MB
2 from user_lobs l, user_segments s
3 where l.segment_name = s.segment_name
4 and l.table_name = 'TABLE_WITH_LOBS'
5 and s.segment_type = 'LOBSEGMENT'
6 order by 1
7 /

COLUMN_NAME SEGMENT_NAME BLOCKS SIZE_MB
------------------------------ ------------------------------ ---------- ----------
INLINE_LOB SYS_LOB0000086489C00004$$ 8 .0625
OOL_LOB_1 SYS_LOB0000086489C00002$$ 1152 9
OOL_LOB_2 SYS_LOB0000086489C00003$$ 1152 9

SQL>
SQL> -- restart the database instance so that V$SEGMENT_STATISTICS are reset
SQL> connect / as sysdba
Connected.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 418484224 bytes
Fixed Size 1336932 bytes
Variable Size 201329052 bytes
Database Buffers 209715200 bytes
Redo Buffers 6103040 bytes
Database mounted.
Database opened.
SQL> -- now let's measure read I/O
SQL> -- create a new session so that session statistics are reset
SQL> connect hemant/hemant
Connected.
SQL>
SQL> -- read twelve rows and one lob column
SQL> select id_column, ool_lob_1
2 from table_with_lobs
3 where id_column between 10 and 21
4 order by id_column
5 /

ID_COLUMN OOL_LOB_1
---------- ------------
10 JHGJctoADd
11 RPZbdLYypm
12 OpOrrJiCzb
13 UPxpHLZhFa
14 OAqNgbIAWw
15 IhXYLHVZQF
16 LYlNaOVKDW
17 rQqbCWQdxM
18 YiicEavWeF
19 dAETtPpjyo
20 OczgVXLJqX
21 hdoJCSFCnQ

12 rows selected.

SQL> -- get the statistics
SQL> select n.name, s.value
2 from v$statname n, v$mystat s
3 where n.statistic#=s.statistic#
4 and n.name in ('lob reads','lob writes', 'lob writes unaligned')
5 /

NAME VALUE
---------------------------------------------------------------- ----------
lob reads 12
lob writes 0
lob writes unaligned 0

SQL> select object_name, statistic_name, value
2 from v$segment_statistics
3 where owner = 'HEMANT'
4 and tablespace_name = 'UNIFORM_64KB'
5 and (
6 object_name like 'SYS_LOB00000864%'
7 or
8 object_name like 'SYS_IL00000864%'
9 )
10 and value != 0
11 order by 1,2
12 /

OBJECT_NAME STATISTIC_NAME VALUE
------------------------------ ------------------------------ ----------
SYS_IL0000086489C00002$$ logical reads 48
SYS_IL0000086489C00002$$ physical read requests 2
SYS_IL0000086489C00002$$ physical reads 5
SYS_LOB0000086489C00002$$ physical read requests 12
SYS_LOB0000086489C00002$$ physical reads 12
SYS_LOB0000086489C00002$$ physical reads direct 12

6 rows selected.

SQL>
SQL> -- trace the same query
SQL> exec dbms_session.session_trace_enable(waits=>TRUE);

PL/SQL procedure successfully completed.

SQL> select id_column, ool_lob_1
2 from table_with_lobs
3 where id_column between 10 and 21
4 order by id_column
5 /

ID_COLUMN OOL_LOB_1
---------- ------------
10 JHGJctoADd
11 RPZbdLYypm
12 OpOrrJiCzb
13 UPxpHLZhFa
14 OAqNgbIAWw
15 IhXYLHVZQF
16 LYlNaOVKDW
17 rQqbCWQdxM
18 YiicEavWeF
19 dAETtPpjyo
20 OczgVXLJqX
21 hdoJCSFCnQ

12 rows selected.

SQL> exec dbms_session.session_trace_disable;

PL/SQL procedure successfully completed.

SQL>


Since the table has 3 LOBs, each LOB has a separate 'lob write' when INSERTing data. Because I insert less than 8KB, each write is 'unaligned'.
The OutOfLine LOB columns still take up 87Bytes in the Table Row.
The InLine LOB, even though I inserted 1500 characters takes up 3103 per entry.
The total storage in each row in the Table segment is 3,310 bytes -- which, fortunately, fits into the 8KB block for a table (so I do not have Row Chaining).

As each OutOfLineLOB is an 8KB write, even a 10character write to OOL_LOB_1 takes an 8KB block. That is why the disk space for the two OutOfLine LOBs far exceeds that for the table rows itself.

When retrieving a LOB columnm, for each row retrieved, a separate read I/O is made.

If I were to retrieve only the non-LOB columns from the table, even if I do a FullTableScan, the OutOfLine LOBs are not read ! That is the advantage of them being OutOfLine. They are not read if they are not required. (Contrast that with normal columns in a table -- if reading a row, the normal columns are in the same row piece in the table block, so they are read from disk but, if not required by the query are ignored after the I/O is executed).

The statistics (from V$MYSTAT, V$SESSION_STAT, V$SYSSTAT and V$SEGMENT_STATISTICS) do show the Physical Reads and the LOB reads ('lob reads' are included in 'physical reads' so you don't need to add the two statistics).
However, if you did an SQL trace, the LOB reads will not be apparent.
Here's a tkprof of the SQL trace :

select id_column, ool_lob_1
from table_with_lobs
where id_column between 10 and 21
order by id_column

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 13 0.00 0.00 0 26 0 12
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 15 0.00 0.00 0 26 0 12

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 184

Rows Row Source Operation
------- ---------------------------------------------------
12 TABLE ACCESS BY INDEX ROWID TABLE_WITH_LOBS (cr=26 pr=0 pw=0 time=0 us cost=9 size=1183 card=13)
12 INDEX RANGE SCAN SYS_C0016880 (cr=14 pr=0 pw=0 time=209 us cost=2 size=0 card=13)(object id 86496)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 14 0.00 0.00
SQL*Net message from client 14 0.03 0.04
********************************************************************************


tkprof only shows that the Table was read --- it does not indicate that the LOBSegment for column OOL_LOB_1, which is a physically separate segment, was actually read. (It also doesn't show the LOBIndex being read).

So, the next time you are dealing with queries that retrieve OutOfLine LOBs (in the WHERE or SELECT clause), remember that additional LOB reads will be occurring as Physical Reads and this will not be apparent in a tkprof of a SQL Trace.

.
.
.

11 March, 2011

Oracle Enterprise Cloud Summit in Singapore

The Oracle cloud event in Singapore was a mild success. It's not clear how many have adopted or tested clouds in this region -- not many responded to the SMS poll immediately.
Published with Blogger-droid v1.6.7

06 March, 2011

Cardinality Estimates in Dynamic Partition Pruning

Dynamic Partition Pruning occurs when the Optimizer cannot determine which Partition(s) of a table will a query have to execute against, although the query does not need to do a Full Table Scan of all the Partitions.
Typically this happens with queries that use Bind Variables. However, it can also occur with SubQueries against Partition Key Columns and where Dimensions are joined to a Fact.

What I want to present here is that in Dynamic Partition Pruning, Oracle cannot really present the expected Cardinality from the Partition search. Explain Plan does present a Cardinality but it cannot really associate the Cardinality with a Partition -- as it does not know which Partition will be searched. It is only at Runtime that the execution will select the correct Partition(s) based on the binds , values from subqueries or dimensions.

Here is an example :

Given these row counts in a Fact Table (Range Partitioned by Sale_Date) and three Dimension Tables :

SQL> select partition_name, num_rows
2 from user_tab_partitions
3 where table_name = 'SALES_FACT'
4 order by partition_position
5 /

PARTITION_NAME NUM_ROWS
------------------------------ ----------
SF_P_2007 249300
SF_P_2008 350698
SF_P_2009 398913
SF_P_2010 499673
SF_P_2011 101411

SQL>
SQL> select count(*)
2 from cust_dim
3 /

COUNT(*)
----------
50

SQL>
SQL> select count(*)
2 from prod_dim
3 /

COUNT(*)
----------
25

SQL>
SQL>
SQL> select count(*)
2 from date_dim
3 where date_month_year = 'JUL-2010'
4 /

COUNT(*)
----------
31

SQL>

So we have Sales Data for the 2007 to 2011 (2 months data in 2011). The number of Sales have generally been increasing, with 2010 having almost twice as many sales as in 2007. What is not apparent from row counts (but does happen in the real world) is that not all 50 Customers have been present through 2007 to 2011. Neither have all Products existed in 2007 as are present in 2011.

In fact (no pun intended !), I have constructed the data such that the year 2007 had only 20 Customers and 15 Products. The number of Customers and Products has increased over the years.
Is the Optimizer aware of this ? Can it be aware of this ?

Here is my first pair of test queries :

SQL> -- Query for Jul 2010, Customer 46, Product 11
SQL> explain plan for
2 select f.sale_qty, f.sale_price, f.discount_rate
3 from sales_fact f, cust_dim c, prod_dim p, date_dim d
4 where
5 f.cust_id=c.cust_id
6 and f.prod_id=p.prod_id
7 and f.sale_date=d.date_value
8 and c.cust_name = 'Customer_46'
9 and p.prod_name = 'Product_11'
10 and d.date_month_year = 'JUL-2010'
11 /

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4059568812

-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 26 | 2314 | 2964 (1)| 00:00:36 | | |
|* 1 | HASH JOIN | | 26 | 2314 | 2964 (1)| 00:00:36 | | |
| 2 | PART JOIN FILTER CREATE | :BF0000 | 30 | 1380 | 11 (0)| 00:00:01 | | |
| 3 | MERGE JOIN CARTESIAN | | 30 | 1380 | 11 (0)| 00:00:01 | | |
| 4 | MERGE JOIN CARTESIAN | | 1 | 29 | 6 (0)| 00:00:01 | | |
|* 5 | TABLE ACCESS FULL | CUST_DIM | 1 | 15 | 3 (0)| 00:00:01 | | |
| 6 | BUFFER SORT | | 1 | 14 | 3 (0)| 00:00:01 | | |
|* 7 | TABLE ACCESS FULL | PROD_DIM | 1 | 14 | 3 (0)| 00:00:01 | | |
| 8 | BUFFER SORT | | 30 | 510 | 8 (0)| 00:00:01 | | |
|* 9 | TABLE ACCESS FULL | DATE_DIM | 30 | 510 | 5 (0)| 00:00:01 | | |
| 10 | PARTITION RANGE JOIN-FILTER| | 1599K| 65M| 2948 (1)| 00:00:36 |:BF0000|:BF0000|
| 11 | TABLE ACCESS FULL | SALES_FACT | 1599K| 65M| 2948 (1)| 00:00:36 |:BF0000|:BF0000|
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("F"."SALE_DATE"="D"."DATE_VALUE" AND "F"."CUST_ID"="C"."CUST_ID" AND
"F"."PROD_ID"="P"."PROD_ID")
5 - filter("C"."CUST_NAME"='Customer_46')
7 - filter("P"."PROD_NAME"='Product_11')
9 - filter("D"."DATE_MONTH_YEAR"='JUL-2010')

27 rows selected.

SQL>
SQL> select count(*) from
2 (
3 select f.sale_qty, f.sale_price, f.discount_rate
4 from sales_fact f, cust_dim c, prod_dim p, date_dim d
5 where
6 f.cust_id=c.cust_id
7 and f.prod_id=p.prod_id
8 and f.sale_date=d.date_value
9 and c.cust_name = 'Customer_46'
10 and p.prod_name = 'Product_11'
11 and d.date_month_year = 'JUL-2010'
12 )
13 /

COUNT(*)
----------
39

SQL>
SQL>
SQL>
SQL> -- Query for Jul 2007, Customer 46, Product 11
SQL> explain plan for
2 select f.sale_qty, f.sale_price, f.discount_rate
3 from sales_fact f, cust_dim c, prod_dim p, date_dim d
4 where
5 f.cust_id=c.cust_id
6 and f.prod_id=p.prod_id
7 and f.sale_date=d.date_value
8 and c.cust_name = 'Customer_46'
9 and p.prod_name = 'Product_11'
10 and d.date_month_year = 'JUL-2007'
11 /

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4059568812

-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 26 | 2314 | 2964 (1)| 00:00:36 | | |
|* 1 | HASH JOIN | | 26 | 2314 | 2964 (1)| 00:00:36 | | |
| 2 | PART JOIN FILTER CREATE | :BF0000 | 30 | 1380 | 11 (0)| 00:00:01 | | |
| 3 | MERGE JOIN CARTESIAN | | 30 | 1380 | 11 (0)| 00:00:01 | | |
| 4 | MERGE JOIN CARTESIAN | | 1 | 29 | 6 (0)| 00:00:01 | | |
|* 5 | TABLE ACCESS FULL | CUST_DIM | 1 | 15 | 3 (0)| 00:00:01 | | |
| 6 | BUFFER SORT | | 1 | 14 | 3 (0)| 00:00:01 | | |
|* 7 | TABLE ACCESS FULL | PROD_DIM | 1 | 14 | 3 (0)| 00:00:01 | | |
| 8 | BUFFER SORT | | 30 | 510 | 8 (0)| 00:00:01 | | |
|* 9 | TABLE ACCESS FULL | DATE_DIM | 30 | 510 | 5 (0)| 00:00:01 | | |
| 10 | PARTITION RANGE JOIN-FILTER| | 1599K| 65M| 2948 (1)| 00:00:36 |:BF0000|:BF0000|
| 11 | TABLE ACCESS FULL | SALES_FACT | 1599K| 65M| 2948 (1)| 00:00:36 |:BF0000|:BF0000|
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("F"."SALE_DATE"="D"."DATE_VALUE" AND "F"."CUST_ID"="C"."CUST_ID" AND
"F"."PROD_ID"="P"."PROD_ID")
5 - filter("C"."CUST_NAME"='Customer_46')
7 - filter("P"."PROD_NAME"='Product_11')
9 - filter("D"."DATE_MONTH_YEAR"='JUL-2007')

27 rows selected.

SQL>
SQL> select count(*) from
2 (
3 select f.sale_qty, f.sale_price, f.discount_rate
4 from sales_fact f, cust_dim c, prod_dim p, date_dim d
5 where
6 f.cust_id=c.cust_id
7 and f.prod_id=p.prod_id
8 and f.sale_date=d.date_value
9 and c.cust_name = 'Customer_46'
10 and p.prod_name = 'Product_11'
11 and d.date_month_year = 'JUL-2007'
12 )
13 /

COUNT(*)
----------
0

SQL>

The Optimizer presented exactly the same estimated Row Counts for the combination of (Product_11, Customer_46) in Jul-2007 as it did for Jul-2010.
Yet, Customer_46 didn't even exist in Jul-2007. There were no sales to Customer_46 in the year 2007 at all.

Here is the next pair of test queries :

SQL> -- Query for Jan 2011, Customer 9, Product 25
SQL> explain plan for
2 select f.sale_qty, f.sale_price, f.discount_rate
3 from sales_fact f, cust_dim c, prod_dim p, date_dim d
4 where
5 f.cust_id=c.cust_id
6 and f.prod_id=p.prod_id
7 and f.sale_date=d.date_value
8 and c.cust_name = 'Customer_9'
9 and p.prod_name = 'Product_25'
10 and d.date_month_year = 'JAN-2011'
11 /

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4059568812

-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 26 | 2314 | 2964 (1)| 00:00:36 | | |
|* 1 | HASH JOIN | | 26 | 2314 | 2964 (1)| 00:00:36 | | |
| 2 | PART JOIN FILTER CREATE | :BF0000 | 30 | 1380 | 11 (0)| 00:00:01 | | |
| 3 | MERGE JOIN CARTESIAN | | 30 | 1380 | 11 (0)| 00:00:01 | | |
| 4 | MERGE JOIN CARTESIAN | | 1 | 29 | 6 (0)| 00:00:01 | | |
|* 5 | TABLE ACCESS FULL | CUST_DIM | 1 | 15 | 3 (0)| 00:00:01 | | |
| 6 | BUFFER SORT | | 1 | 14 | 3 (0)| 00:00:01 | | |
|* 7 | TABLE ACCESS FULL | PROD_DIM | 1 | 14 | 3 (0)| 00:00:01 | | |
| 8 | BUFFER SORT | | 30 | 510 | 8 (0)| 00:00:01 | | |
|* 9 | TABLE ACCESS FULL | DATE_DIM | 30 | 510 | 5 (0)| 00:00:01 | | |
| 10 | PARTITION RANGE JOIN-FILTER| | 1599K| 65M| 2948 (1)| 00:00:36 |:BF0000|:BF0000|
| 11 | TABLE ACCESS FULL | SALES_FACT | 1599K| 65M| 2948 (1)| 00:00:36 |:BF0000|:BF0000|
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("F"."SALE_DATE"="D"."DATE_VALUE" AND "F"."CUST_ID"="C"."CUST_ID" AND
"F"."PROD_ID"="P"."PROD_ID")
5 - filter("C"."CUST_NAME"='Customer_9')
7 - filter("P"."PROD_NAME"='Product_25')
9 - filter("D"."DATE_MONTH_YEAR"='JAN-2011')

27 rows selected.

SQL>
SQL> select count(*) from
2 (
3 select f.sale_qty, f.sale_price, f.discount_rate
4 from sales_fact f, cust_dim c, prod_dim p, date_dim d
5 where
6 f.cust_id=c.cust_id
7 and f.prod_id=p.prod_id
8 and f.sale_date=d.date_value
9 and c.cust_name = 'Customer_9'
10 and p.prod_name = 'Product_25'
11 and d.date_month_year = 'JAN-2011'
12 )
13 /

COUNT(*)
----------
0

SQL>
SQL>
SQL>
SQL> -- Query for Jan 2008, Customer 9, Product 25
SQL> explain plan for
2 select f.sale_qty, f.sale_price, f.discount_rate
3 from sales_fact f, cust_dim c, prod_dim p, date_dim d
4 where
5 f.cust_id=c.cust_id
6 and f.prod_id=p.prod_id
7 and f.sale_date=d.date_value
8 and c.cust_name = 'Customer_9'
9 and p.prod_name = 'Product_25'
10 and d.date_month_year = 'JAN-2008'
11 /

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4059568812

-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 26 | 2314 | 2964 (1)| 00:00:36 | | |
|* 1 | HASH JOIN | | 26 | 2314 | 2964 (1)| 00:00:36 | | |
| 2 | PART JOIN FILTER CREATE | :BF0000 | 30 | 1380 | 11 (0)| 00:00:01 | | |
| 3 | MERGE JOIN CARTESIAN | | 30 | 1380 | 11 (0)| 00:00:01 | | |
| 4 | MERGE JOIN CARTESIAN | | 1 | 29 | 6 (0)| 00:00:01 | | |
|* 5 | TABLE ACCESS FULL | CUST_DIM | 1 | 15 | 3 (0)| 00:00:01 | | |
| 6 | BUFFER SORT | | 1 | 14 | 3 (0)| 00:00:01 | | |
|* 7 | TABLE ACCESS FULL | PROD_DIM | 1 | 14 | 3 (0)| 00:00:01 | | |
| 8 | BUFFER SORT | | 30 | 510 | 8 (0)| 00:00:01 | | |
|* 9 | TABLE ACCESS FULL | DATE_DIM | 30 | 510 | 5 (0)| 00:00:01 | | |
| 10 | PARTITION RANGE JOIN-FILTER| | 1599K| 65M| 2948 (1)| 00:00:36 |:BF0000|:BF0000|
| 11 | TABLE ACCESS FULL | SALES_FACT | 1599K| 65M| 2948 (1)| 00:00:36 |:BF0000|:BF0000|
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("F"."SALE_DATE"="D"."DATE_VALUE" AND "F"."CUST_ID"="C"."CUST_ID" AND
"F"."PROD_ID"="P"."PROD_ID")
5 - filter("C"."CUST_NAME"='Customer_9')
7 - filter("P"."PROD_NAME"='Product_25')
9 - filter("D"."DATE_MONTH_YEAR"='JAN-2008')

27 rows selected.

SQL>
SQL> select count(*) from
2 (
3 select f.sale_qty, f.sale_price, f.discount_rate
4 from sales_fact f, cust_dim c, prod_dim p, date_dim d
5 where
6 f.cust_id=c.cust_id
7 and f.prod_id=p.prod_id
8 and f.sale_date=d.date_value
9 and c.cust_name = 'Customer_9'
10 and p.prod_name = 'Product_25'
11 and d.date_month_year = 'JAN-2008'
12 )
13 /

COUNT(*)
----------
0

SQL>

This was a test for the combination of (Product_25, Customer_9). Although Product_25 does exist in 2011, it did not exist as a Product in 2008. (Even in 2011, there were actually zero sales of tis product to Customer_9.)
Also, note how the Row Estimate for the JAN-2011 partition is as high as for any of the other three months presented above. This in spite of the fact that we know that 2011 sales are almost double those in 2007 (and 2011 are on track to exceed 2010). The optimizer presents the same row count for a month in 2007 as in 2011.

Generally, I advise people to not look at the Cost but at the Cardinality at each step of the Execution Plan and evaluate whether the Cardinality correctly matches the really expected row count, else it will have a signficant impact on the real total "cost" of the query. However, in the case of Dynamic Partition Pruning, I probably wouldn't look at the Cardinality for the Table Partition --- unless I suspect that the Table level statistics are wrong.

In the case of Dynamic Partition Pruning, the Optimizer computes expected Cardinality from Table level, not Partition level statistics.

Here are the actual row counts for those CUST_IDs and PROD_IDs in the SALES_FACT table :

MONTH_NA CUST_ID COUNT
-------- ---------- ----------
Jan-2008 9 1242
Jan-2011 9 1003
Jul-2007 9 1037
Jul-2010 9 873
Jan-2011 46 1052
Jul-2010 46 838

MONTH_NA PROD_ID COUNT
-------- ---------- ----------
Jan-2008 11 1442
Jan-2011 11 2149
Jul-2007 11 1412
Jul-2010 11 1741
Jan-2011 25 2048


which shows that CUST_ID=46 ("Customer_46") did not exist in Jul-2007 and PROD_ID=25 ("Product_25") did not exist in Jan-2008.

.
.
.

01 March, 2011

Primary Key and Index

The default expectation is that defining a Primary Key constraint creates a Unique Index on the exactly the same columns (i.e. in the same order as well) as specified in the constraint.

However, this is not necessarily so.
1. You can have a Primary Key constraint with a non-Unique Index.
2. The index can be created ahead of the constraint. (Thus, the constraint "uses" the index as it finds the index already present)
3. The columns in the index need not be in the same order as the PK definition.

Here are two example tables showing :
1. Non-Unique Index being used
2. The Index Columns not in the same order as the Primary Key constraint definition.



SQL> -- table_a example with a non-unique index
SQL> create table table_a
2 as select object_id, owner, object_name, object_type, created
3 from dba_objects
4 where object_id is not null;

Table created.

SQL> create index table_a_ndx on table_a(owner,object_id);

Index created.

SQL> alter table table_a add constraint table_a_pk
2 primary key (owner,object_id);

Table altered.

SQL>
SQL> -- table_b example with the key columns differently ordered
SQL> create table table_b
2 as select object_id, owner, object_name, object_type, created
3 from dba_objects
4 where object_id is not null;

Table created.

SQL> create unique index table_b_unq_ndx on table_b(object_id,owner);

Index created.

SQL> alter table table_b add constraint table_b_pk
2 primary key (owner,object_id);

Table altered.

SQL>
SQL> select table_name, constraint_name, constraint_type, index_name
2 from user_constraints
3 where table_name in ('TABLE_A','TABLE_B')
4 and constraint_type = 'P'
5 order by table_name
6 /

TABLE_NAME CONSTRAINT_NAME C INDEX_NAME
-------------------- -------------------- - --------------------
TABLE_A TABLE_A_PK P TABLE_A_NDX

TABLE_B TABLE_B_PK P TABLE_B_UNQ_NDX


SQL> select index_name, column_name, column_position
2 from user_ind_columns
3 where index_name in ('TABLE_A_NDX','TABLE_B_UNQ_NDX')
4 order by index_name, column_position
5 /

INDEX_NAME COLUMN_NAME COLUMN_POSITION
-------------------- -------------------- ---------------
TABLE_A_NDX OWNER 1
OBJECT_ID 2

TABLE_B_UNQ_NDX OBJECT_ID 1
OWNER 2


SQL>


Table_A has a non-unique Index used to enforce a Primary Key constraint.
Table_B has an Index where the columns are ordered differently.





Here are quiz questions for readers :
1. What happens to the Index definitions if I DROP the constraints ?
2. What happens to the Index definitions if I DISABLE the constraints ?
3. We know that a Primary Key constraint requires that the columns be NOT NULL. Yet when I created the tables, the columns were created as NULLable. So, what gives ? OR, what works ?

.
.
.

Most Popular Posts - Feb 11

Blogger's "Stats" feature shows that the 5 most popular posts in the past 30 days have been :

3. AUTOEXTEND ON Next Size : 184 pageviews

(there does seem to be something fishy about those last 3 entries --- they are always the posts with the highest pageview counts -- I wonder if someone/somewhere has a hard-coded link to these pages that is used repeatedly ?!)

.
.
.