Hemant's Oracle DBA Blog: March 2011

22 March, 2011

OuterJoin with Filter Predicate

I recently came across a query where an outer join was being used. Such an OuterJoin allows us to create a "dummy" row for the columns from a table where the join fails so that we can still present the values from the other table in the join.
However, the query was subsequently modified by the developer to add a filter predicate on the table that the OuterJoin was defined on.
Remember that an OuterJoin returns columns with NULL values for the table that is Outer Joined.
If you add a simple filter for this table, the OuterJoin is negated -- because the rows from the OuterJoin, returning NULLs, would fail the filter on the table !

In this example, based on the well-known EMP and DEPT tables, I first show an OuterJoin where "dummy" emp rows are created (and presented with E.DEPTNO as 0). EMP has no rows for DEPTNOs 40 and 50, so these are returned as 0s.
Then, a filter on a column that appears neither in the WHERE clause nor in the SELECT clause negates the Outer Join !


SQL> -- create the tables
SQL> drop table emp;

Table dropped.

SQL> drop table dept;

Table dropped.

SQL> create table dept
  2  as select * from scott.dept;

Table created.

SQL> create table emp
  2  as select * from scott.emp;

Table created.

SQL> 
SQL> -- list the DEPT values
SQL> select deptno from dept;

    DEPTNO
----------
        10
        20
        30
        40

4 rows selected.

SQL> select distinct(deptno) from emp;

    DEPTNO
----------
        30
        20
        10

3 rows selected.

SQL> 
SQL> -- create a dept
SQL> insert into dept values (50,'HQ','SINGAPORE');

1 row created.

SQL> commit;

Commit complete.

SQL> 
SQL> -- OuterJoin query
SQL> -- creates "dummy" rows from EMP
SQL> -- for DEPTIDs that exist in DEPT but do not exist in EMP
SQL> select d.deptno D_Deptno, d.loc, d.dname, nvl(e.deptno,0) E_Deptno, e.empno, e.ename
  2  from emp e, dept d
  3  where e.deptno(+) = d.deptno
  4  order by 1
  5  /

  D_DEPTNO LOC           DNAME            E_DEPTNO      EMPNO ENAME
---------- ------------- -------------- ---------- ---------- ----------
        10 NEW YORK      ACCOUNTING             10       7934 MILLER
        10 NEW YORK      ACCOUNTING             10       7839 KING
        10 NEW YORK      ACCOUNTING             10       7782 CLARK
        20 DALLAS        RESEARCH               20       7902 FORD
        20 DALLAS        RESEARCH               20       7876 ADAMS
        20 DALLAS        RESEARCH               20       7566 JONES
        20 DALLAS        RESEARCH               20       7369 SMITH
        20 DALLAS        RESEARCH               20       7788 SCOTT
        30 CHICAGO       SALES                  30       7900 JAMES
        30 CHICAGO       SALES                  30       7844 TURNER
        30 CHICAGO       SALES                  30       7698 BLAKE
        30 CHICAGO       SALES                  30       7654 MARTIN
        30 CHICAGO       SALES                  30       7521 WARD
        30 CHICAGO       SALES                  30       7499 ALLEN
        40 BOSTON        OPERATIONS              0
        50 SINGAPORE     HQ                      0

16 rows selected.

SQL> 
SQL> -- OuterJoin query with filter
SQL> -- what happens if the query is modified to add a filter
SQL> -- even if the filter is not on the join column ?
SQL> select d.deptno D_Deptno, d.loc, d.dname, nvl(e.deptno,0) E_Deptno, e.empno, e.ename
  2  from emp e, dept d
  3  where e.deptno(+) = d.deptno
  4  -- an additional filter for job='CLERK' has been added
  5  -- this column is not in the Join and is not in the SELECT
  6  and e.job = 'CLERK'
  7  order by 1
  8  /

  D_DEPTNO LOC           DNAME            E_DEPTNO      EMPNO ENAME
---------- ------------- -------------- ---------- ---------- ----------
        10 NEW YORK      ACCOUNTING             10       7934 MILLER
        20 DALLAS        RESEARCH               20       7876 ADAMS
        20 DALLAS        RESEARCH               20       7369 SMITH
        30 CHICAGO       SALES                  30       7900 JAMES

4 rows selected.

SQL> 
SQL> --  Which are the CLERKs  ?
SQL> select e.deptno, e.empno, e.ename
  2  from emp e
  3  where e.job = 'CLERK'
  4  order by 1
  5  /

    DEPTNO      EMPNO ENAME
---------- ---------- ----------
        10       7934 MILLER
        20       7369 SMITH
        20       7876 ADAMS
        30       7900 JAMES

4 rows selected.

SQL>

Note how the addition of a filter "e.job = 'CLERK'" changed the output of the EMP - DEPT join query from 16 rows to 4 rows. Even the two "dummy" rows for DEPTNOs 40 and 50 are now excluded.

Thus, if you need to use an OuterJoin, check what filter predicates you are specifying on the table that is OuterJoined !

UPDATE : As Rob and "orcl" explain, applying an "OuterJoin" on the additional filter column will allow retrieval of DEPTNOs 40 and 50 from DEPT, although this filter will exclude all other employees from EMP !


SQL> -- OuterJoin query with filter
SQL> -- specify an OuterJoin on the filter column as well !
SQL> select d.deptno D_Deptno, d.loc, d.dname, nvl(e.deptno,0) E_Deptno, e.empno, e.ename
  2  from emp e, dept d
  3  where e.deptno(+) = d.deptno
  4  -- an additional filter for job='CLERK' has been added
  5  -- this column is not in the Join and is not in the SELECT
  6  -- and e.job = 'CLERK'
  7  and e.job(+) = 'CLERK'
  8  order by 1
  9  /

  D_DEPTNO LOC           DNAME            E_DEPTNO      EMPNO ENAME
---------- ------------- -------------- ---------- ---------- ----------
        10 NEW YORK      ACCOUNTING             10       7934 MILLER
        20 DALLAS        RESEARCH               20       7369 SMITH
        20 DALLAS        RESEARCH               20       7876 ADAMS
        30 CHICAGO       SALES                  30       7900 JAMES
        40 BOSTON        OPERATIONS              0
        50 SINGAPORE     HQ                      0

6 rows selected.

SQL>

So, while "e.deptno(+) = d.deptno" is really an OuterJoin, we are simulating an OuterJoin in "e.job(+) = 'CLERK'" to ensure that we create dummy rows in EMP. Strange ?! We don't get dummy rows for jobs 'SALESMAN', 'MANAGER' etc which do not "join" to 'CLERK' although we do get dummy rows where the join to "d.deptno" would not be successful.

.
.
.

18 March, 2011

I/O for OutOfLine LOBs

When you define a LOB (CLOB or BLOB) as a column for a table, you have the choice of specifying whether it is Inline (stored with the row) or OutOfLine (stored in a separate segment). (Note : An Inline LOB still has a LOB Segment because an entry exceeding 4000bytes will automatically be stored OutOfLine).
The data in the OutOfLine LOB, being stored in a separate segment (which is of type 'LOBSEGMENT') is accessed via an Index (which is of type 'LOBINDEX').

It is easy to map a LOB Segment to the actual Table Column. The Segment Name contains the ObjectID and ColumnID. Alternatively, you can query USER_LOBs.

OutOfLine LOBs are read and written in CHUNK sizes. Typically the CHUNK size is one datablock, although it can be defined to be a large multiple of the datablock size. Therefore, if you store less than a datablock sized contents in a LOB, the I/O is still executed as a datablock. Thus, in an 8KB tablespace, the I/O is one 8KB datablock even if you store only a few tens of bytes or hundreds of bytes in a deliberately defined OutOfLine LOB.

Here is a simple demonstration of LOB Storage and I/O. I start with a table that has 2 OutOfLine LOBs and one Inline LOB (remember : data exceeding 4000 bytes in Inline LOB is automatically stored OutOfLine so Oracle precreates a LOBSEGMENT even for the Inline LOB).


SQL> drop table table_with_LOBs;

Table dropped.

SQL> 
SQL> create table table_with_LOBs
  2  (id_column  number not null primary key,
  3   ool_lob_1  clob,
  4   ool_lob_2  clob,
  5   inline_lob clob,
  6   date_col date,
  7   varchar_col varchar2(50))
  8  lob (ool_lob_1)
  9  store as (disable storage in row tablespace Uniform_64KB chunk 8192 nocache logging),
 10  lob (ool_lob_2)
 11  store as (disable storage in row tablespace Uniform_64KB chunk 8192 nocache logging),
 12  lob (inline_lob)
 13  store as (enable storage in row)
 14  /

Table created.

SQL> 
SQL> 
SQL> insert into table_with_LOBs
  2  select rownum, -- id_column
  3  dbms_random.string('A',10), -- ool_lob_1 at 10 characters per row
  4  dbms_random.string('A',1000), -- ool_lob_2 at 1000 characters per row
  5  dbms_random.string('A',1500), -- inline_lob
  6  sysdate,
  7  dbms_random.string('A',20) -- varchar
  8  from dual
  9  connect by level < 1001
 10  /

1000 rows created.

SQL> 
SQL> -- measure the writes
SQL> -- notice how each LOB column had a write call (3 x 1000)
SQL> -- also the data written is less than the CHUNKSIZE so it is 'unaligned'
SQL> select n.name, s.value
  2  from v$statname n, v$mystat s
  3  where n.statistic#=s.statistic#
  4  and n.name in ('lob reads','lob writes', 'lob writes unaligned')
  5  /

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
lob reads                                                                 0
lob writes                                                             3000
lob writes unaligned                                                   3000

SQL> commit;

Commit complete.

SQL> 
SQL> -- compute statisics
SQL> exec dbms_stats.gather_table_stats('','TABLE_WITH_LOBS',estimate_percent=>100,cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL> 
SQL> -- get column lengths
SQL> select column_name, data_type, data_length, avg_col_len
  2  from user_tab_columns
  3  where table_name = 'TABLE_WITH_LOBS'
  4  order by column_id
  5  /

COLUMN_NAME                    DATA_TYPE  DATA_LENGTH AVG_COL_LEN
------------------------------ ---------- ----------- -----------
ID_COLUMN                      NUMBER              22           4
OOL_LOB_1                      CLOB               476          87
OOL_LOB_2                      CLOB               476          87
INLINE_LOB                     CLOB              4000        3103
DATE_COL                       DATE                 7           8
VARCHAR_COL                    VARCHAR2            50          21

6 rows selected.

SQL> 
SQL> -- get Table size
SQL> select blocks, num_rows, avg_row_len
  2  from user_tables
  3  where table_name = 'TABLE_WITH_LOBS'
  4  /

    BLOCKS   NUM_ROWS AVG_ROW_LEN
---------- ---------- -----------
       622       1000        3310

SQL> select blocks, bytes/1048576 Size_MB
  2  from user_segments
  3  where segment_type = 'TABLE'
  4  and segment_name = 'TABLE_WITH_LOBS'
  5  /

    BLOCKS    SIZE_MB
---------- ----------
       640          5

SQL> 
SQL> -- get LOBSegment sizes -- note how the OutOfLine LOBs are 1 8KB block per row
SQL> select l.column_name, l.segment_name, s.blocks, s.bytes/1048576 Size_MB
  2  from user_lobs l, user_segments s
  3  where l.segment_name = s.segment_name
  4  and l.table_name = 'TABLE_WITH_LOBS'
  5  and s.segment_type = 'LOBSEGMENT'
  6  order by 1
  7  /

COLUMN_NAME                    SEGMENT_NAME                       BLOCKS    SIZE_MB
------------------------------ ------------------------------ ---------- ----------
INLINE_LOB                     SYS_LOB0000086489C00004$$               8      .0625
OOL_LOB_1                      SYS_LOB0000086489C00002$$            1152          9
OOL_LOB_2                      SYS_LOB0000086489C00003$$            1152          9

SQL> 
SQL> -- restart the database instance so that V$SEGMENT_STATISTICS are reset
SQL> connect / as sysdba
Connected.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area  418484224 bytes
Fixed Size                  1336932 bytes
Variable Size             201329052 bytes
Database Buffers          209715200 bytes
Redo Buffers                6103040 bytes
Database mounted.
Database opened.
SQL> -- now let's measure read I/O
SQL> -- create a new session so that session statistics are reset
SQL> connect hemant/hemant
Connected.
SQL> 
SQL> -- read twelve rows and one lob column
SQL> select id_column, ool_lob_1
  2  from table_with_lobs
  3  where id_column between 10 and 21
  4  order by id_column
  5  /

 ID_COLUMN OOL_LOB_1
---------- ------------
        10 JHGJctoADd
        11 RPZbdLYypm
        12 OpOrrJiCzb
        13 UPxpHLZhFa
        14 OAqNgbIAWw
        15 IhXYLHVZQF
        16 LYlNaOVKDW
        17 rQqbCWQdxM
        18 YiicEavWeF
        19 dAETtPpjyo
        20 OczgVXLJqX
        21 hdoJCSFCnQ

12 rows selected.

SQL> -- get the statistics
SQL> select n.name, s.value
  2  from v$statname n, v$mystat s
  3  where n.statistic#=s.statistic#
  4  and n.name in ('lob reads','lob writes', 'lob writes unaligned')
  5  /

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
lob reads                                                                12
lob writes                                                                0
lob writes unaligned                                                      0

SQL> select object_name, statistic_name, value
  2  from v$segment_statistics
  3  where owner = 'HEMANT'
  4  and tablespace_name = 'UNIFORM_64KB'
  5  and (
  6  object_name like 'SYS_LOB00000864%'
  7  or
  8  object_name like 'SYS_IL00000864%'
  9    )
 10  and value != 0
 11  order by 1,2
 12  /

OBJECT_NAME                    STATISTIC_NAME                      VALUE
------------------------------ ------------------------------ ----------
SYS_IL0000086489C00002$$       logical reads                          48
SYS_IL0000086489C00002$$       physical read requests                  2
SYS_IL0000086489C00002$$       physical reads                          5
SYS_LOB0000086489C00002$$      physical read requests                 12
SYS_LOB0000086489C00002$$      physical reads                         12
SYS_LOB0000086489C00002$$      physical reads direct                  12

6 rows selected.

SQL> 
SQL> -- trace the same query
SQL> exec dbms_session.session_trace_enable(waits=>TRUE);

PL/SQL procedure successfully completed.

SQL> select id_column, ool_lob_1
  2  from table_with_lobs
  3  where id_column between 10 and 21
  4  order by id_column
  5  /

 ID_COLUMN OOL_LOB_1
---------- ------------
        10 JHGJctoADd
        11 RPZbdLYypm
        12 OpOrrJiCzb
        13 UPxpHLZhFa
        14 OAqNgbIAWw
        15 IhXYLHVZQF
        16 LYlNaOVKDW
        17 rQqbCWQdxM
        18 YiicEavWeF
        19 dAETtPpjyo
        20 OczgVXLJqX
        21 hdoJCSFCnQ

12 rows selected.

SQL> exec dbms_session.session_trace_disable;

PL/SQL procedure successfully completed.

SQL>

Since the table has 3 LOBs, each LOB has a separate 'lob write' when INSERTing data. Because I insert less than 8KB, each write is 'unaligned'.
The OutOfLine LOB columns still take up 87Bytes in the Table Row.
The InLine LOB, even though I inserted 1500 characters takes up 3103 per entry.
The total storage in each row in the Table segment is 3,310 bytes -- which, fortunately, fits into the 8KB block for a table (so I do not have Row Chaining).

As each OutOfLineLOB is an 8KB write, even a 10character write to OOL_LOB_1 takes an 8KB block. That is why the disk space for the two OutOfLine LOBs far exceeds that for the table rows itself.

When retrieving a LOB columnm, for each row retrieved, a separate read I/O is made.

If I were to retrieve only the non-LOB columns from the table, even if I do a FullTableScan, the OutOfLine LOBs are not read ! That is the advantage of them being OutOfLine. They are not read if they are not required. (Contrast that with normal columns in a table -- if reading a row, the normal columns are in the same row piece in the table block, so they are read from disk but, if not required by the query are ignored after the I/O is executed).

The statistics (from V$MYSTAT, V$SESSION_STAT, V$SYSSTAT and V$SEGMENT_STATISTICS) do show the Physical Reads and the LOB reads ('lob reads' are included in 'physical reads' so you don't need to add the two statistics).
However, if you did an SQL trace, the LOB reads will not be apparent.
Here's a tkprof of the SQL trace :


select id_column, ool_lob_1
from table_with_lobs
where id_column between 10 and 21
order by id_column

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       13      0.00       0.00          0         26          0          12
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       15      0.00       0.00          0         26          0          12

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 184

Rows     Row Source Operation
-------  ---------------------------------------------------
     12  TABLE ACCESS BY INDEX ROWID TABLE_WITH_LOBS (cr=26 pr=0 pw=0 time=0 us cost=9 size=1183 card=13)
     12   INDEX RANGE SCAN SYS_C0016880 (cr=14 pr=0 pw=0 time=209 us cost=2 size=0 card=13)(object id 86496)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                      14        0.00          0.00
  SQL*Net message from client                    14        0.03          0.04
********************************************************************************

tkprof only shows that the Table was read --- it does not indicate that the LOBSegment for column OOL_LOB_1, which is a physically separate segment, was actually read. (It also doesn't show the LOBIndex being read).

So, the next time you are dealing with queries that retrieve OutOfLine LOBs (in the WHERE or SELECT clause), remember that additional LOB reads will be occurring as Physical Reads and this will not be apparent in a tkprof of a SQL Trace.

.
.
.

11 March, 2011

Oracle Enterprise Cloud Summit in Singapore

The Oracle cloud event in Singapore was a mild success. It's not clear how many have adopted or tested clouds in this region -- not many responded to the SMS poll immediately.

Published with Blogger-droid v1.6.7

06 March, 2011

Cardinality Estimates in Dynamic Partition Pruning

Dynamic Partition Pruning occurs when the Optimizer cannot determine which Partition(s) of a table will a query have to execute against, although the query does not need to do a Full Table Scan of all the Partitions.
Typically this happens with queries that use Bind Variables. However, it can also occur with SubQueries against Partition Key Columns and where Dimensions are joined to a Fact.

What I want to present here is that in Dynamic Partition Pruning, Oracle cannot really present the expected Cardinality from the Partition search. Explain Plan does present a Cardinality but it cannot really associate the Cardinality with a Partition -- as it does not know which Partition will be searched. It is only at Runtime that the execution will select the correct Partition(s) based on the binds , values from subqueries or dimensions.

Here is an example :

Given these row counts in a Fact Table (Range Partitioned by Sale_Date) and three Dimension Tables :


SQL> select partition_name, num_rows
2  from user_tab_partitions
3  where table_name = 'SALES_FACT'
4  order by partition_position
5  /

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
SF_P_2007                          249300
SF_P_2008                          350698
SF_P_2009                          398913
SF_P_2010                          499673
SF_P_2011                          101411

SQL>
SQL> select count(*)
2  from cust_dim
3  /

COUNT(*)
----------
      50

SQL>
SQL> select count(*)
2  from prod_dim
3  /

COUNT(*)
----------
      25

SQL>
SQL>
SQL> select count(*)
2  from date_dim
3  where date_month_year = 'JUL-2010'
4  /

COUNT(*)
----------
      31

SQL>

So we have Sales Data for the 2007 to 2011 (2 months data in 2011). The number of Sales have generally been increasing, with 2010 having almost twice as many sales as in 2007. What is not apparent from row counts (but does happen in the real world) is that not all 50 Customers have been present through 2007 to 2011. Neither have all Products existed in 2007 as are present in 2011.

In fact (no pun intended !), I have constructed the data such that the year 2007 had only 20 Customers and 15 Products. The number of Customers and Products has increased over the years.
Is the Optimizer aware of this ? Can it be aware of this ?

Here is my first pair of test queries :


SQL> -- Query for Jul 2010, Customer 46, Product 11
SQL> explain plan for
2  select  f.sale_qty, f.sale_price, f.discount_rate
3  from sales_fact f, cust_dim c, prod_dim p, date_dim d
4  where
5  f.cust_id=c.cust_id
6  and f.prod_id=p.prod_id
7  and f.sale_date=d.date_value
8  and c.cust_name = 'Customer_46'
9  and p.prod_name = 'Product_11'
10  and d.date_month_year = 'JUL-2010'
11  /

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4059568812

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |    26 |  2314 |  2964   (1)| 00:00:36 |       |       |
|*  1 |  HASH JOIN                   |            |    26 |  2314 |  2964   (1)| 00:00:36 |       |       |
|   2 |   PART JOIN FILTER CREATE    | :BF0000    |    30 |  1380 |    11   (0)| 00:00:01 |       |       |
|   3 |    MERGE JOIN CARTESIAN      |            |    30 |  1380 |    11   (0)| 00:00:01 |       |       |
|   4 |     MERGE JOIN CARTESIAN     |            |     1 |    29 |     6   (0)| 00:00:01 |       |       |
|*  5 |      TABLE ACCESS FULL       | CUST_DIM   |     1 |    15 |     3   (0)| 00:00:01 |       |       |
|   6 |      BUFFER SORT             |            |     1 |    14 |     3   (0)| 00:00:01 |       |       |
|*  7 |       TABLE ACCESS FULL      | PROD_DIM   |     1 |    14 |     3   (0)| 00:00:01 |       |       |
|   8 |     BUFFER SORT              |            |    30 |   510 |     8   (0)| 00:00:01 |       |       |
|*  9 |      TABLE ACCESS FULL       | DATE_DIM   |    30 |   510 |     5   (0)| 00:00:01 |       |       |
|  10 |   PARTITION RANGE JOIN-FILTER|            |  1599K|    65M|  2948   (1)| 00:00:36 |:BF0000|:BF0000|
|  11 |    TABLE ACCESS FULL         | SALES_FACT |  1599K|    65M|  2948   (1)| 00:00:36 |:BF0000|:BF0000|
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 1 - access("F"."SALE_DATE"="D"."DATE_VALUE" AND "F"."CUST_ID"="C"."CUST_ID" AND
            "F"."PROD_ID"="P"."PROD_ID")
 5 - filter("C"."CUST_NAME"='Customer_46')
 7 - filter("P"."PROD_NAME"='Product_11')
 9 - filter("D"."DATE_MONTH_YEAR"='JUL-2010')

27 rows selected.

SQL>
SQL> select count(*) from
2  (
3  select  f.sale_qty, f.sale_price, f.discount_rate
4  from sales_fact f, cust_dim c, prod_dim p, date_dim d
5  where
6  f.cust_id=c.cust_id
7  and f.prod_id=p.prod_id
8  and f.sale_date=d.date_value
9  and c.cust_name = 'Customer_46'
10  and p.prod_name = 'Product_11'
11  and d.date_month_year = 'JUL-2010'
12  )
13  /

COUNT(*)
----------
      39

SQL>
SQL>
SQL>
SQL> -- Query for Jul 2007, Customer 46, Product 11
SQL> explain plan for
2  select  f.sale_qty, f.sale_price, f.discount_rate
3  from sales_fact f, cust_dim c, prod_dim p, date_dim d
4  where
5  f.cust_id=c.cust_id
6  and f.prod_id=p.prod_id
7  and f.sale_date=d.date_value
8  and c.cust_name = 'Customer_46'
9  and p.prod_name = 'Product_11'
10  and d.date_month_year = 'JUL-2007'
11  /

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4059568812

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |    26 |  2314 |  2964   (1)| 00:00:36 |       |       |
|*  1 |  HASH JOIN                   |            |    26 |  2314 |  2964   (1)| 00:00:36 |       |       |
|   2 |   PART JOIN FILTER CREATE    | :BF0000    |    30 |  1380 |    11   (0)| 00:00:01 |       |       |
|   3 |    MERGE JOIN CARTESIAN      |            |    30 |  1380 |    11   (0)| 00:00:01 |       |       |
|   4 |     MERGE JOIN CARTESIAN     |            |     1 |    29 |     6   (0)| 00:00:01 |       |       |
|*  5 |      TABLE ACCESS FULL       | CUST_DIM   |     1 |    15 |     3   (0)| 00:00:01 |       |       |
|   6 |      BUFFER SORT             |            |     1 |    14 |     3   (0)| 00:00:01 |       |       |
|*  7 |       TABLE ACCESS FULL      | PROD_DIM   |     1 |    14 |     3   (0)| 00:00:01 |       |       |
|   8 |     BUFFER SORT              |            |    30 |   510 |     8   (0)| 00:00:01 |       |       |
|*  9 |      TABLE ACCESS FULL       | DATE_DIM   |    30 |   510 |     5   (0)| 00:00:01 |       |       |
|  10 |   PARTITION RANGE JOIN-FILTER|            |  1599K|    65M|  2948   (1)| 00:00:36 |:BF0000|:BF0000|
|  11 |    TABLE ACCESS FULL         | SALES_FACT |  1599K|    65M|  2948   (1)| 00:00:36 |:BF0000|:BF0000|
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 1 - access("F"."SALE_DATE"="D"."DATE_VALUE" AND "F"."CUST_ID"="C"."CUST_ID" AND
            "F"."PROD_ID"="P"."PROD_ID")
 5 - filter("C"."CUST_NAME"='Customer_46')
 7 - filter("P"."PROD_NAME"='Product_11')
 9 - filter("D"."DATE_MONTH_YEAR"='JUL-2007')

27 rows selected.

SQL>
SQL> select count(*) from
2  (
3  select  f.sale_qty, f.sale_price, f.discount_rate
4  from sales_fact f, cust_dim c, prod_dim p, date_dim d
5  where
6  f.cust_id=c.cust_id
7  and f.prod_id=p.prod_id
8  and f.sale_date=d.date_value
9  and c.cust_name = 'Customer_46'
10  and p.prod_name = 'Product_11'
11  and d.date_month_year = 'JUL-2007'
12  )
13  /

COUNT(*)
----------
       0

SQL>

The Optimizer presented exactly the same estimated Row Counts for the combination of (Product_11, Customer_46) in Jul-2007 as it did for Jul-2010.
Yet, Customer_46 didn't even exist in Jul-2007. There were no sales to Customer_46 in the year 2007 at all.

Here is the next pair of test queries :


SQL> -- Query for Jan 2011, Customer 9, Product 25
SQL> explain plan for
2  select  f.sale_qty, f.sale_price, f.discount_rate
3  from sales_fact f, cust_dim c, prod_dim p, date_dim d
4  where
5  f.cust_id=c.cust_id
6  and f.prod_id=p.prod_id
7  and f.sale_date=d.date_value
8  and c.cust_name = 'Customer_9'
9  and p.prod_name = 'Product_25'
10  and d.date_month_year = 'JAN-2011'
11  /

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4059568812

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |    26 |  2314 |  2964   (1)| 00:00:36 |       |       |
|*  1 |  HASH JOIN                   |            |    26 |  2314 |  2964   (1)| 00:00:36 |       |       |
|   2 |   PART JOIN FILTER CREATE    | :BF0000    |    30 |  1380 |    11   (0)| 00:00:01 |       |       |
|   3 |    MERGE JOIN CARTESIAN      |            |    30 |  1380 |    11   (0)| 00:00:01 |       |       |
|   4 |     MERGE JOIN CARTESIAN     |            |     1 |    29 |     6   (0)| 00:00:01 |       |       |
|*  5 |      TABLE ACCESS FULL       | CUST_DIM   |     1 |    15 |     3   (0)| 00:00:01 |       |       |
|   6 |      BUFFER SORT             |            |     1 |    14 |     3   (0)| 00:00:01 |       |       |
|*  7 |       TABLE ACCESS FULL      | PROD_DIM   |     1 |    14 |     3   (0)| 00:00:01 |       |       |
|   8 |     BUFFER SORT              |            |    30 |   510 |     8   (0)| 00:00:01 |       |       |
|*  9 |      TABLE ACCESS FULL       | DATE_DIM   |    30 |   510 |     5   (0)| 00:00:01 |       |       |
|  10 |   PARTITION RANGE JOIN-FILTER|            |  1599K|    65M|  2948   (1)| 00:00:36 |:BF0000|:BF0000|
|  11 |    TABLE ACCESS FULL         | SALES_FACT |  1599K|    65M|  2948   (1)| 00:00:36 |:BF0000|:BF0000|
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 1 - access("F"."SALE_DATE"="D"."DATE_VALUE" AND "F"."CUST_ID"="C"."CUST_ID" AND
            "F"."PROD_ID"="P"."PROD_ID")
 5 - filter("C"."CUST_NAME"='Customer_9')
 7 - filter("P"."PROD_NAME"='Product_25')
 9 - filter("D"."DATE_MONTH_YEAR"='JAN-2011')

27 rows selected.

SQL>
SQL> select count(*) from
2  (
3  select  f.sale_qty, f.sale_price, f.discount_rate
4  from sales_fact f, cust_dim c, prod_dim p, date_dim d
5  where
6  f.cust_id=c.cust_id
7  and f.prod_id=p.prod_id
8  and f.sale_date=d.date_value
9  and c.cust_name = 'Customer_9'
10  and p.prod_name = 'Product_25'
11  and d.date_month_year = 'JAN-2011'
12  )
13  /

COUNT(*)
----------
       0

SQL>
SQL>
SQL>
SQL> -- Query for Jan 2008, Customer 9, Product 25
SQL> explain plan for
2  select  f.sale_qty, f.sale_price, f.discount_rate
3  from sales_fact f, cust_dim c, prod_dim p, date_dim d
4  where
5  f.cust_id=c.cust_id
6  and f.prod_id=p.prod_id
7  and f.sale_date=d.date_value
8  and c.cust_name = 'Customer_9'
9  and p.prod_name = 'Product_25'
10  and d.date_month_year = 'JAN-2008'
11  /

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4059568812

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |    26 |  2314 |  2964   (1)| 00:00:36 |       |       |
|*  1 |  HASH JOIN                   |            |    26 |  2314 |  2964   (1)| 00:00:36 |       |       |
|   2 |   PART JOIN FILTER CREATE    | :BF0000    |    30 |  1380 |    11   (0)| 00:00:01 |       |       |
|   3 |    MERGE JOIN CARTESIAN      |            |    30 |  1380 |    11   (0)| 00:00:01 |       |       |
|   4 |     MERGE JOIN CARTESIAN     |            |     1 |    29 |     6   (0)| 00:00:01 |       |       |
|*  5 |      TABLE ACCESS FULL       | CUST_DIM   |     1 |    15 |     3   (0)| 00:00:01 |       |       |
|   6 |      BUFFER SORT             |            |     1 |    14 |     3   (0)| 00:00:01 |       |       |
|*  7 |       TABLE ACCESS FULL      | PROD_DIM   |     1 |    14 |     3   (0)| 00:00:01 |       |       |
|   8 |     BUFFER SORT              |            |    30 |   510 |     8   (0)| 00:00:01 |       |       |
|*  9 |      TABLE ACCESS FULL       | DATE_DIM   |    30 |   510 |     5   (0)| 00:00:01 |       |       |
|  10 |   PARTITION RANGE JOIN-FILTER|            |  1599K|    65M|  2948   (1)| 00:00:36 |:BF0000|:BF0000|
|  11 |    TABLE ACCESS FULL         | SALES_FACT |  1599K|    65M|  2948   (1)| 00:00:36 |:BF0000|:BF0000|
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 1 - access("F"."SALE_DATE"="D"."DATE_VALUE" AND "F"."CUST_ID"="C"."CUST_ID" AND
            "F"."PROD_ID"="P"."PROD_ID")
 5 - filter("C"."CUST_NAME"='Customer_9')
 7 - filter("P"."PROD_NAME"='Product_25')
 9 - filter("D"."DATE_MONTH_YEAR"='JAN-2008')

27 rows selected.

SQL>
SQL> select count(*) from
2  (
3  select  f.sale_qty, f.sale_price, f.discount_rate
4  from sales_fact f, cust_dim c, prod_dim p, date_dim d
5  where
6  f.cust_id=c.cust_id
7  and f.prod_id=p.prod_id
8  and f.sale_date=d.date_value
9  and c.cust_name = 'Customer_9'
10  and p.prod_name = 'Product_25'
11  and d.date_month_year = 'JAN-2008'
12  )
13  /

COUNT(*)
----------
       0

SQL>

This was a test for the combination of (Product_25, Customer_9). Although Product_25 does exist in 2011, it did not exist as a Product in 2008. (Even in 2011, there were actually zero sales of tis product to Customer_9.)
Also, note how the Row Estimate for the JAN-2011 partition is as high as for any of the other three months presented above. This in spite of the fact that we know that 2011 sales are almost double those in 2007 (and 2011 are on track to exceed 2010). The optimizer presents the same row count for a month in 2007 as in 2011.

Generally, I advise people to not look at the Cost but at the Cardinality at each step of the Execution Plan and evaluate whether the Cardinality correctly matches the really expected row count, else it will have a signficant impact on the real total "cost" of the query. However, in the case of Dynamic Partition Pruning, I probably wouldn't look at the Cardinality for the Table Partition --- unless I suspect that the Table level statistics are wrong.

In the case of Dynamic Partition Pruning, the Optimizer computes expected Cardinality from Table level, not Partition level statistics.

Here are the actual row counts for those CUST_IDs and PROD_IDs in the SALES_FACT table :


MONTH_NA    CUST_ID      COUNT
-------- ---------- ----------
Jan-2008          9       1242
Jan-2011          9       1003
Jul-2007          9       1037
Jul-2010          9        873
Jan-2011         46       1052
Jul-2010         46        838

MONTH_NA    PROD_ID      COUNT
-------- ---------- ----------
Jan-2008         11       1442
Jan-2011         11       2149
Jul-2007         11       1412
Jul-2010         11       1741
Jan-2011         25       2048

which shows that CUST_ID=46 ("Customer_46") did not exist in Jul-2007 and PROD_ID=25 ("Product_25") did not exist in Jan-2008.

.
.
.

01 March, 2011

Primary Key and Index

The default expectation is that defining a Primary Key constraint creates a Unique Index on the exactly the same columns (i.e. in the same order as well) as specified in the constraint.

However, this is not necessarily so.
1. You can have a Primary Key constraint with a non-Unique Index.
2. The index can be created ahead of the constraint. (Thus, the constraint "uses" the index as it finds the index already present)
3. The columns in the index need not be in the same order as the PK definition.

Here are two example tables showing :
1. Non-Unique Index being used
2. The Index Columns not in the same order as the Primary Key constraint definition.


SQL> -- table_a example with a non-unique index
SQL> create table table_a
  2  as select object_id, owner, object_name, object_type, created
  3  from dba_objects
  4  where object_id is not null;

Table created.

SQL> create index table_a_ndx on table_a(owner,object_id);

Index created.

SQL> alter table table_a add constraint table_a_pk
  2  primary key (owner,object_id);

Table altered.

SQL> 
SQL> -- table_b example with the key columns differently ordered
SQL> create table table_b
  2  as select object_id, owner, object_name, object_type, created
  3  from dba_objects
  4  where object_id is not null;

Table created.

SQL> create unique index table_b_unq_ndx on table_b(object_id,owner);

Index created.

SQL> alter table table_b add constraint table_b_pk
  2  primary key (owner,object_id);

Table altered.

SQL> 
SQL> select table_name, constraint_name, constraint_type, index_name
  2  from user_constraints
  3  where table_name in ('TABLE_A','TABLE_B')
  4  and constraint_type = 'P'
  5  order by table_name
  6  /

TABLE_NAME           CONSTRAINT_NAME      C INDEX_NAME
-------------------- -------------------- - --------------------
TABLE_A              TABLE_A_PK           P TABLE_A_NDX

TABLE_B              TABLE_B_PK           P TABLE_B_UNQ_NDX


SQL> select index_name, column_name, column_position
  2  from user_ind_columns
  3  where index_name in ('TABLE_A_NDX','TABLE_B_UNQ_NDX')
  4  order by index_name, column_position
  5  /

INDEX_NAME           COLUMN_NAME          COLUMN_POSITION
-------------------- -------------------- ---------------
TABLE_A_NDX          OWNER                              1
                     OBJECT_ID                          2

TABLE_B_UNQ_NDX      OBJECT_ID                          1
                     OWNER                              2


SQL>

Table_A has a non-unique Index used to enforce a Primary Key constraint.
Table_B has an Index where the columns are ordered differently.

Here are quiz questions for readers :
1. What happens to the Index definitions if I DROP the constraints ?
2. What happens to the Index definitions if I DISABLE the constraints ?
3. We know that a Primary Key constraint requires that the columns be NOT NULL. Yet when I created the tables, the columns were created as NULLable. So, what gives ? OR, what works ?

.
.
.

Most Popular Posts - Feb 11

Blogger's "Stats" feature shows that the 5 most popular posts in the past 30 days have been :

1. Locks and Lock Trees : 126 pageviews

2. Bringing ONLINE a Datafile that is in RECOVER mode because it was OFFLINE : 149 pageviews

3. AUTOEXTEND ON Next Size : 184 pageviews

4. Materialized View on Prebuilt Table : 253 pageviews

5. OPEN RESETLOGS without really doing a Recovery : 331 pageviews

(there does seem to be something fishy about those last 3 entries --- they are always the posts with the highest pageview counts -- I wonder if someone/somewhere has a hard-coded link to these pages that is used repeatedly ?!)

Hemant's Oracle DBA Blog