17 April, 2011

DETERMINISTIC Functions - 3

Returning to my thread on Deterministic Functions ......

Update : See the UPDATE and comments at the end of this post.

Suppose that I have a (partitioned) SALES table with data like this :
SQL> desc sales;
Name Null? Type
----------------------------------------- -------- ----------------------------
SALE_DATE DATE
SALE_ID NUMBER(8)
SALESMAN_ID NUMBER(5)
SALE_QTY NUMBER
SALE_PRICE NUMBER

SQL> select count(*) from sales partition (sales_p_2010);

COUNT(*)
----------
36399

SQL> select min(sale_date), max(sale_date) from sales partition (sales_p_2010);

MIN(SALE_ MAX(SALE_
--------- ---------
01-JAN-10 30-DEC-10

SQL> select sale_qty, count(*) from sales partition (sales_p_2010) group by sale_qty order by 1;

SALE_QTY COUNT(*)
---------- ----------
1000 18199
2000 7280
3000 10920

SQL> select sale_price, count(*) from sales partition (sales_p_2010) group by sale_price order by 1;

SALE_PRICE COUNT(*)
---------- ----------
1 9114
2 9003
3 9143
4 9139


And a SALES_TAX_RATES tables such as :
SQL> desc sales_tax_rates;
Name Null? Type
----------------------------------------- -------- ----------------------------
DATE_FROM DATE
DATE_TO DATE
QUANTITY NUMBER
TAX_RATE NUMBER

SQL> select * from sales_tax_rates order by date_from, quantity;

DATE_FROM DATE_TO QUANTITY TAX_RATE
--------- --------- ---------- ----------
01-JAN-10 31-DEC-10 1000 .05
01-JAN-10 31-DEC-10 2000 .06
01-JAN-10 31-DEC-10 3000 .07
01-JAN-10 31-DEC-10 5000 .1
01-JAN-11 31-DEC-11 1000 .07
01-JAN-11 31-DEC-11 2000 .08
01-JAN-11 31-DEC-11 3000 .09
01-JAN-11 31-DEC-11 5000 .15

8 rows selected.

SQL>


Since the Sales Tax Rate varies by Quantity, I need to factor the quantity when computing sales tax.
I have a function for the Year 2010 Sales Tax calculations as :
SQL> create or replace function sales_tax_nd_2010
2 (sale_qty_in number, sale_price_in number)
3 return number
4 as
5 f_sales_tax number;
6 f_tax_rate number;
7 begin
8 select tax_rate into f_tax_rate from sales_tax_rates
9 where
10 date_from = to_date('01-JAN-2010','DD-MON-YYYY')
11 and date_to = to_date('31-DEC-2010','DD-MON-YYYY')
12 and sale_qty_in = quantity;
13 select (sale_qty_in * sale_price_in * f_tax_rate) into f_sales_tax from dual;
14 return f_sales_tax;
15 end;
16 /

Function created.

SQL>

(Notice that I haven't defined it as DETERMINISTIC).

Excuting this function, it takes 190 seconds to compute the total Sales Tax for the Year 2010 :
SQL> exec dbms_session.session_trace_enable(waits=>FALSE,binds=>FALSE);

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.08
SQL> select sum(sales_tax_nd_2010(sale_qty, sale_price)) from sales;

SUM(SALES_TAX_ND_2010(SALE_QTY,SALE_PRICE))
-------------------------------------------
10194440

Elapsed: 00:03:10.21
SQL>


The tkprof output for this run is :
SQL ID: 5qxuj4xvh4ctb
Plan Hash: 3519235612
select sum(sales_tax_nd_2010(sale_qty, sale_price))
from
sales


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.14 0.28 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 103.75 126.48 0 143 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 103.90 126.77 0 143 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 184

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=254963 pr=0 pw=0 time=0 us)
36399 PARTITION RANGE ALL PARTITION: 1 2 (cr=143 pr=0 pw=0 time=5807971 us cost=42 size=218394 card=36399)
36399 TABLE ACCESS FULL SALES PARTITION: 1 2 (cr=143 pr=0 pw=0 time=975849 us cost=42 size=218394 card=36399)

SQL ID: g7bbysbxz5bmg
Plan Hash: 1259010386
SELECT TAX_RATE
FROM
SALES_TAX_RATES WHERE DATE_FROM = TO_DATE('01-JAN-2010','DD-MON-YYYY') AND
DATE_TO = TO_DATE('31-DEC-2010','DD-MON-YYYY') AND :B1 = QUANTITY


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.03 0 0 0 0
Execute 36390 6.05 8.49 0 0 0 0
Fetch 36390 13.71 32.59 0 254730 0 36390
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 72781 19.78 41.12 0 254730 0 36390

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 184 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 TABLE ACCESS FULL SALES_TAX_RATES (cr=7 pr=0 pw=0 time=0 us cost=3 size=22 card=1)

SQL ID: 0vp2tqpgvc379
Plan Hash: 1388734953
SELECT (:B3 * :B2 * :B1 )
FROM
DUAL


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 36391 7.32 9.80 0 0 0 0
Fetch 36391 6.54 12.32 0 0 0 36391
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 72783 13.87 22.13 0 0 0 36391

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 184 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 FAST DUAL (cr=0 pr=0 pw=0 time=0 us cost=2 size=0 card=1)

SELECT TAX_RATE FROM SALES_TAX_RATES WHERE DATE_FROM = TO_DATE('01-JAN-2010','DD-MON-YYYY') AND DATE_TO = TO_DATE('31-DEC-2010','DD-MON-YYYY') AN
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 7 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 7 0 1

SELECT (:B3 * :B2 *
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 0 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 0 0 1

SELECT TAX_RATE FROM SALES_TAX_RATES WHERE DATE_FROM = TO_DATE('01-JAN-2010','DD-MON-YYYY') AND DATE_TO = TO_DATE('31-DEC-2010','DD-MON-YYYY') AN
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 7 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 7 0 1

SELECT (:B3 * :B2 *
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 0 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 0 0 1

SELECT TAX_RATE FROM SALES_TAX_RATES WHERE DATE_FROM = TO_DATE('01-JAN-2010','DD-MON-YYYY') AND DATE_TO = TO_DATE('31-DEC-2010','DD-MON-YYYY') AN
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 7 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 7 0 1

SELECT (:B3 * :B2 *
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 0 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 0 0 1

SELECT TAX_RATE FROM SALES_TAX_RATES WHERE DATE_FROM = TO_DATE('01-JAN-2010','DD-MON-YYYY') AND DATE_TO = TO_DATE('31-DEC-2010','DD-MON-YYYY') AN
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 7 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 7 0 1

SELECT (:B3 * :B2 *
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 0 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 0 0 1

SELECT TAX_RATE FROM SALES_TAX_RATES WHERE DATE_FROM = TO_DATE('01-JAN-2010','DD-MON-YYYY') AND DATE_TO = TO_DATE('31-DEC-2010','DD-MON-YYYY') AN
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 7 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 7 0 1

SELECT TAX_RATE FROM SALES_TAX_RATES WHERE DATE_FROM = TO_DATE('01-JAN-2010','DD-MON-YYYY') AND DATE_TO = TO_DATE('31-DEC-2010','DD-MON-YYYY') AN
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 7 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 7 0 1

SELECT (:B3 * :B2 *
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 0 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 0 0 1

SELECT TAX_RATE FROM SALES_TAX_RATES WHERE DATE_FROM = TO_DATE('01-JAN-2010','DD-MON-YYYY') AND DATE_TO = TO_DATE('31-DEC-2010','DD-MON-YYYY') AN
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 7 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 7 0 1

SELECT (:B3 * :B2 *
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 0 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 0 0 1

SELECT TAX_RATE FROM SALES_TAX_RATES WHERE DATE_FROM = TO_DATE('01-JAN-2010','DD-MON-YYYY') AND DATE_TO = TO_DATE('31-DEC-2010','DD-MON-YYYY') AN
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 7 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 7 0 1

SELECT (:B3 * :B2 *
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 0 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 0 0 1

SELECT TAX_RATE FROM SALES_TAX_RATES WHERE DATE_FROM = TO_DATE('01-JAN-2010','DD-MON-YYYY') AND DATE_TO = TO_DATE('31-DEC-2010','DD-MON-YYYY') AN
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 7 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 7 0 1

SELECT (:B3 * :B2 *
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 0 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 0 0 1


(Apparently, the presentation of some of the executions is mangled as if these are different SQL statements. This could be a bug but it has no significant impact on my test).

Therefore, with a (default) Non-Deterministic Function, the Function (and the two SQLs therein) was executed once for each row fetched from the SALES table -- so we have two additional SQL (recursive depth 1) calls for each row !
These additional SQL calls are quite an overhead !


I know the data well. I know that my Sales records have only 3 different Sale Quantities (SALE_QTY) and 4 different Sale Prices (SALE_PRICE). Therefore, the probability of the same SALE_QTY and SALE_PRICE being repeated in the next row fetched is quite high. If I had very many different Quantities and/or Prices, the probability of repetition would be very low.
With the high probability of repetition, I know that I can instruct Oracle to assume that the computed Sales Tax would be the same for the next row from the SALES table if the SALE_QTY and SALE_PRICE were the same ! I can use a DETERMINISTIC function.

I define my Sales Tax computation function as a DETERMINISTIC function thus :
SQL> create or replace function sales_tax_DET_2010
2 (sale_qty_in number, sale_price_in number)
3 return number DETERMINISTIC
4 as
5 f_sales_tax number;
6 f_tax_rate number;
7 begin
8 select tax_rate into f_tax_rate from sales_tax_rates
9 where
10 date_from = to_date('01-JAN-2010','DD-MON-YYYY')
11 and date_to = to_date('31-DEC-2010','DD-MON-YYYY')
12 and sale_qty_in = quantity;
13 select (sale_qty_in * sale_price_in * f_tax_rate) into f_sales_tax from dual;
14 return f_sales_tax;
15 end;
16 /

Function created.

SQL>


I then compute Year 2010 Sales Tax using this function :
SQL> exec dbms_session.session_trace_enable(waits=>FALSE,binds=>FALSE);

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.17
SQL> lock table sales_tax_rates in exclusive mode;

Table(s) Locked.

Elapsed: 00:00:00.18
SQL> select sum(sales_tax_det_2010(sale_qty, sale_price)) from sales;

SUM(SALES_TAX_DET_2010(SALE_QTY,SALE_PRICE))
--------------------------------------------
10194440

Elapsed: 00:00:01.21
SQL>


This time, the total computation took only 1.21seconds ! A vast improvement ! (Had there been 3.6million rows instead of 36,400 only, the difference in time would be very very noticeable -- provided that the pattern of SALE_QTY and SALE_PRICE held even for the larger volume).

(WHY did I lock the SALES_TAX_RATES table ? I wanted to ensure that no values were changed while my computation was running. If the rates in this table were changed, the computed Sales Tax amounts wouldn't be accurate with a DETERMINISTIC definition for the function !)

The tkprof output shows :
SQL ID: 3s4xbaczvghpb
Plan Hash: 3519235612
select sum(sales_tax_det_2010(sale_qty, sale_price))
from
sales


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.10 0.15 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.20 0.25 0 143 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.31 0.41 0 143 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 184

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=273 pr=0 pw=0 time=0 us)
36399 PARTITION RANGE ALL PARTITION: 1 2 (cr=143 pr=0 pw=0 time=10254530 us cost=42 size=218394 card=36399)
36399 TABLE ACCESS FULL SALES PARTITION: 1 2 (cr=143 pr=0 pw=0 time=6083319 us cost=42 size=218394 card=36399)

SQL ID: g7bbysbxz5bmg
Plan Hash: 1259010386
SELECT TAX_RATE
FROM
SALES_TAX_RATES WHERE DATE_FROM = TO_DATE('01-JAN-2010','DD-MON-YYYY') AND
DATE_TO = TO_DATE('31-DEC-2010','DD-MON-YYYY') AND :B1 = QUANTITY


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 12 0.02 0.03 0 0 0 0
Fetch 12 0.00 0.00 0 84 0 12
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 25 0.04 0.04 0 84 0 12

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 184 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 TABLE ACCESS FULL SALES_TAX_RATES (cr=7 pr=0 pw=0 time=0 us cost=3 size=22 card=1)

SQL ID: 0vp2tqpgvc379
Plan Hash: 1388734953
SELECT (:B3 * :B2 * :B1 )
FROM
DUAL


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 12 0.00 0.00 0 0 0 0
Fetch 12 0.00 0.00 0 0 0 12
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 25 0.00 0.01 0 0 0 12

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 184 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 FAST DUAL (cr=0 pr=0 pw=0 time=0 us cost=2 size=0 card=1)


The two SQLs in the fuction were executed only 12 times. That means that the function was executed only 12 times. At every row, Oracle evaluated the SALE_QTY and SALE_PRICE versus the values from the previous row. It would re-execute the function only if either of these values differed. If the values are the same, it avoids the overhead of the Function call itself as I have explicitly (with my DETERMINISTIC definition) assured Oracle that the computed value of the Function wouldn't change if the input values were the same !

This is how (if I know the data -- the pattern of SALE_QTY and SALE_PRICE values !) I could use a DETERMINISTIC definition.

UPDATE : Let me clarify that I would use a DETERMINSTIC function in this manner only if :
a. I knew that the data in the target table (SALES) follows such a pattern that the function predicates will not vary frequently
b. I knew that the reference table (SALES_TAX_RATES) would not be updated frequently. This is an important lookup table where change controls mandate that data is not updated until and approved and only in specific windows. Of course, I add the LOCK TABLE SALES_TAX_RATES as a matter of abundant caution.

(btw : There's another way to speed up the execution as well, still using a Function but rewriting it again. ... That is actually a very simple exercise).

.
.
.

5 comments:

DomBrooks said...

Possibly leading question... would you advocate using a deterministic function over, for example, a scalar subselect/subquery and the benefit of scalar subquery caching?

I know I wouldn't. But I've left my blog series on this unfinished.

Hemant K Chitale said...

Dom,

Pre-10g, DETERMINISTIC Functions didn't really work. The definition was in the documentation but this was usable only for Function Based Indexes.
Since 10g and in 11g, optimization has been incorporated.
So, now, it is a matter of choice and "which works better for a given scenario". Personally, I find it easier to understand a straight-forward SQL than a one relying on (Scalar) SubQuery Caching.

I might write up a blog on entry on (Scalar) SubQuery Caching soon. It's difficult to understand.

Hemant

Randolf said...

But Hemant... Readers should never block writers and vice-versa. There are more elegant ways in Oracle to achieve read consistency instead of locking a whole table :-)

Why not try a "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE" instead?

By the way, it looks like that scalar subqueries due to the way they are implemented are using the SCN of the "parent" query for read consistency.

So for a scalar subquery without the invocation of user-defined functions that perform recursive SQL read-consistency / phantom reads / repeatable reads are not an issue.

Might be a point towards using subqueries + subquery caching.

Randolf

Hemant K Chitale said...

Randolf,

I understand your point.

However, I choose to use a DETERMINISTIC function where the reference table (SALES_TAX_RATES) is one that will *not* be modified frequently. Such a table would be covered by change control processes that would determine when it is updated -- only once or twice in a year.
Only as abundant caution did I add the "LOCK TABLE".
With the DETERMINISTIC function (and my guaranteeing that the lookups will not change at each execution), I can tell Oracle that it can actually avoid having to do the lookup. Thus it doesn't even have to worry about "read consistency".

I wouldn't do this against a table that may be updated frequently, though.

I will update the post to clarify this.

Hemant

Sayan Malakshinov said...

Hemant, it may be interesting for you: i described deterministic function caching mechanism in SQL and compared it with scalar subquery caching and also optimization loops with deterministic functions in PL/SQL:
http://orasql.org/category/oracle/deterministic-functions/