29 March, 2008

Example "sliced" trace files and tkprof

Referring to my previous posting , here is a [small] case study :

Assume that this SQL file represents the "Application Server" process that ran 2 SQL statements twice each :

select o.id, o.name from my_order_test o, my_source_tab s
where o.name=s.object_name
and o.id > 57000
order by o.id
/
rem run another statement to close the cursor
select count(*) from to_shrink;

pause pausing here to simulate a wait (what happens is that an index get's dropped)


select o.id, o.name from my_order_test o, my_source_tab s
where o.name=s.object_name
and o.id > 57000
order by o.id
/
rem run another statement to close the cursor
select count(*) from to_shrink;

rem Completed


Assume that a developer actually dropped an index (MY_ORDER_TEST_NDX_1) while the first execution of the query on "TO_SHRINK" was runing [I put a pause in to allow me enough time to drop the index from another session]. In that case, the execution plan for the first SQL statement (joing "MY_ORDER_TEST" with "MY_SOURCE_TAB") would change from the first execution to the second.

Would a tkprof listing show me the two execution plans ?

Here is the tkprof listing for the full (2 executions of 2 SQLs) run of the simulated "Application Server") :


select o.id, o.name from my_order_test o, my_source_tab s
where o.name=s.object_name
and o.id > 57000
order by o.id

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.07 0.08 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 2 0.03 0.04 0 3424 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6 0.10 0.12 0 3424 0 0

Misses in library cache during parse: 2
Optimizer mode: ALL_ROWS
Parsing user id: 64

Rows Row Source Operation
------- ---------------------------------------------------
0 SORT ORDER BY (cr=3 pr=0 pw=0 time=388 us)
0 HASH JOIN (cr=3 pr=0 pw=0 time=358 us)
0 TABLE ACCESS BY INDEX ROWID MY_ORDER_TEST (cr=3 pr=0 pw=0 time=60 us)
0 INDEX RANGE SCAN MY_ORDER_TEST_NDX_1 (cr=3 pr=0 pw=0 time=47 us)(object id 57002)
0 TABLE ACCESS FULL MY_SOURCE_TAB (cr=0 pr=0 pw=0 time=0 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
SQL*Net message from client 2 0.00 0.01
********************************************************************************

select count(*)
from
to_shrink


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 4 0.00 0.01 0 998 0 2
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 8 0.00 0.02 0 998 0 2

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 64

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=499 pr=0 pw=0 time=10927 us)
33100 TABLE ACCESS FULL TO_SHRINK (cr=499 pr=0 pw=0 time=99382 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 4 0.00 0.00
SQL*Net message from client 4 12.70 22.95



********************************************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 0.07 0.08 0 0 0 0
Execute 4 0.00 0.00 0 0 0 0
Fetch 6 0.03 0.06 0 4422 0 2
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 14 0.10 0.14 0 4422 0 2

Misses in library cache during parse: 2

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 6 0.00 0.00
SQL*Net message from client 6 12.70 22.96


Notice that tkprof does tell me that there were 2 executions of the first SQL statement (joining MY_ORDER_TEST with MY_SOURCE_TAB) and, also, that it was parsed twice, it does not indicate that there were 2 different execution plans used. It presents Row Sources Operations for only the first execution -- the one with the index present.
The fact that the second execution of the same SQL resulted in a FullTableScan of MY_ORDER_TEST isn't immediately visible.

Since I have a suspicion that tkprof isn't (or is unable to) presenting the "complete truth", [there were a total of 3,424 consistent gets but the RowSourceOperations list shows only 3 blocks in "cr" mode !] I split up the raw trace file and run seperate tkprofs. I throw away the header of the trace file.

Thus, for example, I create "Partial_1.trc" as :


PARSING IN CURSOR #2 len=115 dep=0 uid=64 oct=3 lid=64 tim=101226877659 hv=1396900901 ad='2ae90eec'
select o.id, o.name from my_order_test o, my_source_tab s
where o.name=s.object_name
and o.id > 57000
order by o.id
END OF STMT
PARSE #2:c=0,e=2024,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=101226877649
EXEC #2:c=0,e=119,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=101226924421
WAIT #2: nam='SQL*Net message to client' ela= 7 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=101226926138
FETCH #2:c=0,e=374,p=0,cr=3,cu=0,mis=0,r=0,dep=0,og=1,tim=101226928013
WAIT #2: nam='SQL*Net message from client' ela= 4980 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=101226934787
STAT #2 id=1 cnt=0 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 time=388 us)'
STAT #2 id=2 cnt=0 pid=1 pos=1 obj=0 op='HASH JOIN (cr=3 pr=0 pw=0 time=358 us)'
STAT #2 id=3 cnt=0 pid=2 pos=1 obj=56926 op='TABLE ACCESS BY INDEX ROWID MY_ORDER_TEST (cr=3 pr=0 pw=0 time=60 us)'
STAT #2 id=4 cnt=0 pid=3 pos=1 obj=57002 op='INDEX RANGE SCAN MY_ORDER_TEST_NDX_1 (cr=3 pr=0 pw=0 time=47 us)'
STAT #2 id=5 cnt=0 pid=2 pos=2 obj=54900 op='TABLE ACCESS FULL MY_SOURCE_TAB (cr=0 pr=0 pw=0 time=0 us)'
=====================
PARSING IN CURSOR #1 len=30 dep=0 uid=64 oct=3 lid=64 tim=101226943872 hv=3007295325 ad='2ae92c70'
select count(*) from to_shrink
END OF STMT
PARSE #1:c=0,e=66,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=101226943864
EXEC #1:c=0,e=105,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=101226950809
WAIT #1: nam='SQL*Net message to client' ela= 8 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=101226952270
FETCH #1:c=0,e=10926,p=0,cr=499,cu=0,mis=0,r=1,dep=0,og=1,tim=101226965517
WAIT #1: nam='SQL*Net message from client' ela= 279 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=101226967554
FETCH #1:c=0,e=5,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=101226969482
WAIT #1: nam='SQL*Net message to client' ela= 8 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=101226971120
*** 2008-03-29 00:37:19.328
WAIT #1: nam='SQL*Net message from client' ela= 12707496 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=101239680105
STAT #1 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=499 pr=0 pw=0 time=10927 us)'
STAT #1 id=2 cnt=33100 pid=1 pos=1 obj=56876 op='TABLE ACCESS FULL TO_SHRINK (cr=499 pr=0 pw=0 time=99382 us)'
=====================




Can I run tkprof on this partial trace ? Yes, I can. These are the results :


select o.id, o.name from my_order_test o, my_source_tab s
where o.name=s.object_name
and o.id > 57000
order by o.id

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 3 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.00 0.00 0 3 0 0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 64

Rows Row Source Operation
------- ---------------------------------------------------
0 SORT ORDER BY (cr=3 pr=0 pw=0 time=388 us)
0 HASH JOIN (cr=3 pr=0 pw=0 time=358 us)
0 TABLE ACCESS BY INDEX ROWID MY_ORDER_TEST (cr=3 pr=0 pw=0 time=60 us)
0 INDEX RANGE SCAN MY_ORDER_TEST_NDX_1 (cr=3 pr=0 pw=0 time=47 us)(object id 57002)
0 TABLE ACCESS FULL MY_SOURCE_TAB (cr=0 pr=0 pw=0 time=0 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 0.00 0.00
********************************************************************************

select count(*)
from
to_shrink


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.01 0 499 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.01 0 499 0 1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 64

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=499 pr=0 pw=0 time=10927 us)
33100 TABLE ACCESS FULL TO_SHRINK (cr=499 pr=0 pw=0 time=99382 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
SQL*Net message from client 2 12.70 12.70



There we have it. A tkprof on a "partial" trace file (one that has been sliced out of the full file) -- on a file which doesn't even have proper headers. I took care to ensure that the slice of the trace file does include all the WAIT events and the Row Source Operations.


Here's another slice of the trace file :


PARSING IN CURSOR #1 len=30 dep=0 uid=64 oct=3 lid=64 tim=101226943872 hv=3007295325 ad='2ae92c70'
select count(*) from to_shrink
END OF STMT
PARSE #1:c=0,e=66,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=101226943864
EXEC #1:c=0,e=105,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=101226950809
WAIT #1: nam='SQL*Net message to client' ela= 8 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=101226952270
FETCH #1:c=0,e=10926,p=0,cr=499,cu=0,mis=0,r=1,dep=0,og=1,tim=101226965517
WAIT #1: nam='SQL*Net message from client' ela= 279 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=101226967554
FETCH #1:c=0,e=5,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=101226969482
WAIT #1: nam='SQL*Net message to client' ela= 8 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=101226971120
*** 2008-03-29 00:37:19.328
WAIT #1: nam='SQL*Net message from client' ela= 12707496 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=101239680105
STAT #1 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=499 pr=0 pw=0 time=10927 us)'
STAT #1 id=2 cnt=33100 pid=1 pos=1 obj=56876 op='TABLE ACCESS FULL TO_SHRINK (cr=499 pr=0 pw=0 time=99382 us)'
=====================
PARSING IN CURSOR #1 len=493 dep=1 uid=0 oct=3 lid=0 tim=101239685660 hv=2584065658 ad='2faf0ddc'
select t.ts#,t.file#,t.block#,nvl(t.bobj#,0),nvl(t.tab#,0),t.intcols,nvl(t.clucols,0),t.audit$,t.flags,t.pctfree$,t.pctused$,t.initrans,t.maxtrans,t.rowcnt,t.blkcnt,t.empcnt,t.avgspc,t.chncnt,t.avgrln,t.analyzetime,t.samplesize,t.cols,t.property,nvl(t.degree,1),nvl(t.instances,1),t.avgspc_flb,t.flbcnt,t.kernelcols,nvl(t.trigflag, 0),nvl(t.spare1,0),nvl(t.spare2,0),t.spare4,t.spare6,ts.cachedblk,ts.cachehit,ts.logicalread from tab$ t, tab_stats$ ts where t.obj#= :1 and t.obj# = ts.obj# (+)
END OF STMT
PARSE #1:c=0,e=29,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=101239685652
EXEC #1:c=0,e=87,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=101239694390
FETCH #1:c=0,e=174,p=0,cr=4,cu=0,mis=0,r=1,dep=1,og=4,tim=101239696127
STAT #1 id=1 cnt=1 pid=0 pos=1 obj=0 op='NESTED LOOPS OUTER (cr=4 pr=0 pw=0 time=160 us)'
STAT #1 id=2 cnt=1 pid=1 pos=1 obj=4 op='TABLE ACCESS CLUSTER TAB$ (cr=3 pr=0 pw=0 time=114 us)'
STAT #1 id=3 cnt=1 pid=2 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 time=33 us)'
STAT #1 id=4 cnt=0 pid=1 pos=2 obj=708 op='TABLE ACCESS BY INDEX ROWID TAB_STATS$ (cr=1 pr=0 pw=0 time=24 us)'
STAT #1 id=5 cnt=0 pid=4 pos=1 obj=709 op='INDEX UNIQUE SCAN I_TAB_STATS$_OBJ# (cr=1 pr=0 pw=0 time=13 us)'
=====================
PARSING IN CURSOR #1 len=789 dep=1 uid=0 oct=3 lid=0 tim=101239706457 hv=3159716790 ad='2faeeabc'
select i.obj#,i.ts#,i.file#,i.block#,i.intcols,i.type#,i.flags,i.property,i.pctfree$,i.initrans,i.maxtrans,i.blevel,i.leafcnt,i.distkey,i.lblkkey,i.dblkkey,i.clufac,i.cols,i.analyzetime,i.samplesize,i.dataobj#,nvl(i.degree,1),nvl(i.instances,1),i.rowcnt,mod(i.pctthres$,256),i.indmethod#,i.trunccnt,nvl(c.unicols,0),nvl(c.deferrable#+c.valid#,0),nvl(i.spare1,i.intcols),i.spare4,i.spare2,i.spare6,decode(i.pctthres$,null,null,mod(trunc(i.pctthres$/256),256)),ist.cachedblk,ist.cachehit,ist.logicalread from ind$ i, ind_stats$ ist, (select enabled, min(cols) unicols,min(to_number(bitand(defer,1))) deferrable#,min(to_number(bitand(defer,4))) valid# from cdef$ where obj#=:1 and enabled > 1 group by enabled) c where i.obj#=c.enabled(+) and i.obj# = ist.obj#(+) and i.bo#=:1 order by i.obj#
END OF STMT
PARSE #1:c=0,e=50,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=101239706448
EXEC #1:c=0,e=144,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=101239714327
FETCH #1:c=0,e=428,p=0,cr=3,cu=0,mis=0,r=0,dep=1,og=4,tim=101239716321
STAT #1 id=1 cnt=0 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 time=474 us)'
STAT #1 id=2 cnt=0 pid=1 pos=1 obj=0 op='HASH JOIN OUTER (cr=3 pr=0 pw=0 time=429 us)'
STAT #1 id=3 cnt=0 pid=2 pos=1 obj=0 op='NESTED LOOPS OUTER (cr=3 pr=0 pw=0 time=91 us)'
STAT #1 id=4 cnt=0 pid=3 pos=1 obj=19 op='TABLE ACCESS CLUSTER IND$ (cr=3 pr=0 pw=0 time=79 us)'
STAT #1 id=5 cnt=1 pid=4 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 time=35 us)'
STAT #1 id=6 cnt=0 pid=3 pos=2 obj=710 op='TABLE ACCESS BY INDEX ROWID IND_STATS$ (cr=0 pr=0 pw=0 time=0 us)'
STAT #1 id=7 cnt=0 pid=6 pos=1 obj=711 op='INDEX UNIQUE SCAN I_IND_STATS$_OBJ# (cr=0 pr=0 pw=0 time=0 us)'
STAT #1 id=8 cnt=0 pid=2 pos=2 obj=0 op='VIEW (cr=0 pr=0 pw=0 time=0 us)'
STAT #1 id=9 cnt=0 pid=8 pos=1 obj=0 op='SORT GROUP BY (cr=0 pr=0 pw=0 time=0 us)'
STAT #1 id=10 cnt=0 pid=9 pos=1 obj=31 op='TABLE ACCESS BY INDEX ROWID CDEF$ (cr=0 pr=0 pw=0 time=0 us)'
STAT #1 id=11 cnt=0 pid=10 pos=1 obj=51 op='INDEX RANGE SCAN I_CDEF2 (cr=0 pr=0 pw=0 time=0 us)'
=====================
PARSING IN CURSOR #1 len=348 dep=1 uid=0 oct=3 lid=0 tim=101239734792 hv=2512561537 ad='2fafea98'
select name,intcol#,segcol#,type#,length,nvl(precision#,0),decode(type#,2,nvl(scale,-127/*MAXSB1MINAL*/),178,scale,179,scale,180,scale,181,scale,182,scale,183,scale,231,scale,0),null$,fixedstorage,nvl(deflength,0),default$,rowid,col#,property, nvl(charsetid,0),nvl(charsetform,0),spare1,spare2,nvl(spare3,0) from col$ where obj#=:1 order by intcol#
END OF STMT
PARSE #1:c=0,e=53,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=101239734783
EXEC #1:c=0,e=110,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=101239742906
FETCH #1:c=0,e=169,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=4,tim=101239744762
FETCH #1:c=0,e=30,p=0,cr=0,cu=0,mis=0,r=1,dep=1,og=4,tim=101239746407
FETCH #1:c=0,e=29,p=0,cr=0,cu=0,mis=0,r=1,dep=1,og=4,tim=101239747966
FETCH #1:c=0,e=29,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=101239750079
STAT #1 id=1 cnt=3 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 time=176 us)'
STAT #1 id=2 cnt=3 pid=1 pos=1 obj=21 op='TABLE ACCESS CLUSTER COL$ (cr=3 pr=0 pw=0 time=92 us)'
STAT #1 id=3 cnt=1 pid=2 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 time=37 us)'
=====================
PARSING IN CURSOR #1 len=169 dep=1 uid=0 oct=3 lid=0 tim=101239756557 hv=1173719687 ad='2fae8e88'
select col#, grantee#, privilege#,max(mod(nvl(option$,0),2)) from objauth$ where obj#=:1 and col# is not null group by privilege#, col#, grantee# order by col#, grantee#
END OF STMT
PARSE #1:c=0,e=52,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=101239756548
EXEC #1:c=0,e=134,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=101239762520
FETCH #1:c=0,e=70,p=0,cr=2,cu=0,mis=0,r=0,dep=1,og=4,tim=101239762642
STAT #1 id=1 cnt=0 pid=0 pos=1 obj=0 op='SORT GROUP BY (cr=2 pr=0 pw=0 time=108 us)'
STAT #1 id=2 cnt=0 pid=1 pos=1 obj=57 op='TABLE ACCESS BY INDEX ROWID OBJAUTH$ (cr=2 pr=0 pw=0 time=61 us)'
STAT #1 id=3 cnt=0 pid=2 pos=1 obj=103 op='INDEX RANGE SCAN I_OBJAUTH1 (cr=2 pr=0 pw=0 time=45 us)'
=====================
PARSING IN CURSOR #1 len=151 dep=1 uid=0 oct=3 lid=0 tim=101239762907 hv=4139184264 ad='2fae8560'
select grantee#,privilege#,nvl(col#,0),max(mod(nvl(option$,0),2))from objauth$ where obj#=:1 group by grantee#,privilege#,nvl(col#,0) order by grantee#
END OF STMT
PARSE #1:c=0,e=46,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=101239762902
EXEC #1:c=0,e=52,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=101239763127
FETCH #1:c=0,e=29,p=0,cr=2,cu=0,mis=0,r=0,dep=1,og=4,tim=101239763201
STAT #1 id=1 cnt=0 pid=0 pos=1 obj=0 op='SORT GROUP BY (cr=2 pr=0 pw=0 time=54 us)'
STAT #1 id=2 cnt=0 pid=1 pos=1 obj=57 op='TABLE ACCESS BY INDEX ROWID OBJAUTH$ (cr=2 pr=0 pw=0 time=25 us)'
STAT #1 id=3 cnt=0 pid=2 pos=1 obj=103 op='INDEX RANGE SCAN I_OBJAUTH1 (cr=2 pr=0 pw=0 time=16 us)'
=====================
PARSING IN CURSOR #1 len=69 dep=1 uid=0 oct=3 lid=0 tim=101239763420 hv=1471956217 ad='2fae9868'
select con#,obj#,rcon#,enabled,nvl(defer,0) from cdef$ where robj#=:1
END OF STMT
PARSE #1:c=0,e=23,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=101239763415
EXEC #1:c=0,e=30,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=101239763610
FETCH #1:c=0,e=22,p=0,cr=1,cu=0,mis=0,r=0,dep=1,og=4,tim=101239763677
STAT #1 id=1 cnt=0 pid=0 pos=1 obj=31 op='TABLE ACCESS BY INDEX ROWID CDEF$ (cr=1 pr=0 pw=0 time=29 us)'
STAT #1 id=2 cnt=0 pid=1 pos=1 obj=52 op='INDEX RANGE SCAN I_CDEF3 (cr=1 pr=0 pw=0 time=19 us)'
=====================
PARSING IN CURSOR #1 len=146 dep=1 uid=0 oct=3 lid=0 tim=101239763835 hv=2107929772 ad='2faf4cb8'
select con#,type#,condlength,intcols,robj#,rcon#,match#,refact,nvl(enabled,0),rowid,cols,nvl(defer,0),mtime,nvl(spare1,0) from cdef$ where obj#=:1
END OF STMT
PARSE #1:c=0,e=17,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=101239763830
EXEC #1:c=0,e=52,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=101239764132
FETCH #1:c=0,e=22,p=0,cr=2,cu=0,mis=0,r=0,dep=1,og=4,tim=101239764200
STAT #1 id=1 cnt=0 pid=0 pos=1 obj=31 op='TABLE ACCESS BY INDEX ROWID CDEF$ (cr=2 pr=0 pw=0 time=29 us)'
STAT #1 id=2 cnt=0 pid=1 pos=1 obj=51 op='INDEX RANGE SCAN I_CDEF2 (cr=2 pr=0 pw=0 time=18 us)'
=====================
PARSING IN CURSOR #2 len=115 dep=0 uid=64 oct=3 lid=64 tim=101239765477 hv=1396900901 ad='2ae90eec'
select o.id, o.name from my_order_test o, my_source_tab s
where o.name=s.object_name
and o.id > 57000
order by o.id
END OF STMT
PARSE #2:c=78125,e=80402,p=0,cr=17,cu=0,mis=1,r=0,dep=0,og=1,tim=101239765469
EXEC #2:c=0,e=77,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=101239765811
WAIT #2: nam='SQL*Net message to client' ela= 6 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=101239765937
FETCH #2:c=31250,e=42294,p=0,cr=3421,cu=0,mis=0,r=0,dep=0,og=1,tim=101239808289
WAIT #2: nam='SQL*Net message from client' ela= 5281 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=101239813784
STAT #2 id=1 cnt=0 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3421 pr=0 pw=0 time=42304 us)'
STAT #2 id=2 cnt=0 pid=1 pos=1 obj=0 op='HASH JOIN (cr=3421 pr=0 pw=0 time=42278 us)'
STAT #2 id=3 cnt=0 pid=2 pos=1 obj=56926 op='TABLE ACCESS FULL MY_ORDER_TEST (cr=3421 pr=0 pw=0 time=41969 us)'
STAT #2 id=4 cnt=0 pid=2 pos=2 obj=54900 op='TABLE ACCESS FULL MY_SOURCE_TAB (cr=0 pr=0 pw=0 time=0 us)'
=====================
PARSING IN CURSOR #1 len=30 dep=0 uid=64 oct=3 lid=64 tim=101239814146 hv=3007295325 ad='2ae92c70'
select count(*) from to_shrink
END OF STMT
PARSE #1:c=0,e=30,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=101239814141
EXEC #1:c=0,e=61,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=101239814346
WAIT #1: nam='SQL*Net message to client' ela= 4 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=101239814403
FETCH #1:c=0,e=8862,p=0,cr=499,cu=0,mis=0,r=1,dep=0,og=1,tim=101239823406
WAIT #1: nam='SQL*Net message from client' ela= 257 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=101239823870
FETCH #1:c=0,e=3,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=101239823937
WAIT #1: nam='SQL*Net message to client' ela= 2 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=101239823983
*** 2008-03-29 00:37:29.718
WAIT #1: nam='SQL*Net message from client' ela= 10247440 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=101250071475
XCTEND rlbk=0, rd_only=1
STAT #1 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=499 pr=0 pw=0 time=8865 us)'
STAT #1 id=2 cnt=33100 pid=1 pos=1 obj=56876 op='TABLE ACCESS FULL TO_SHRINK (cr=499 pr=0 pw=0 time=99360 us)'




I have taken the first execution of the "select count(*) from to_shrink;" in both "slices" or pieces of the trace file.

Here's the tkprof of the second slice. In this tkprof, I can see the different Execution Plan (Rows Source Operations) for the SQL joining MY_ORDER_TEST with MY_SOURCE_TAB (because an index on MY_ORDER_TEST was dropped) :


select count(*)
from
to_shrink


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 4 0.00 0.01 0 998 0 2
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 8 0.00 0.02 0 998 0 2

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 64

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=499 pr=0 pw=0 time=10927 us)
33100 TABLE ACCESS FULL TO_SHRINK (cr=499 pr=0 pw=0 time=99382 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 4 0.00 0.00
SQL*Net message from client 4 12.70 22.95
********************************************************************************

select o.id, o.name from my_order_test o, my_source_tab s
where o.name=s.object_name
and o.id > 57000
order by o.id

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.07 0.07 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.03 0.04 0 3421 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.10 0.12 0 3421 0 0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 64

Rows Row Source Operation
------- ---------------------------------------------------
0 SORT ORDER BY (cr=3421 pr=0 pw=0 time=42304 us)
0 HASH JOIN (cr=3421 pr=0 pw=0 time=42278 us)
0 TABLE ACCESS FULL MY_ORDER_TEST (cr=3421 pr=0 pw=0 time=41969 us)
0 TABLE ACCESS FULL MY_SOURCE_TAB (cr=0 pr=0 pw=0 time=0 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 0.00 0.00





This is where I see the balance 3,421 block gets for the same SQL statement, but a different execution plan.

tkprof on partial trace files

There may be occassions when you need to or wish to run a tkprof on only a portion of a trace file (a fragment of the trace file or a partial trace file).

Cases where I've used this approach are similar to these examples :
1. Tracing was executed for an application server process running for 30minutes, probably running the same 40 SQL statements 20times (a total of 800 execution) in those 30minutes. Assuming that I am interested in 1 particular SQL statement that took 240seconds for 1 of the 20executions (maybe the fifth execution), 180seconds for another (possibly the eighteenth execution) and 5seconds each for all other 18executions.
If I were to run a tkprof on the whole trace file, it would show me that the SQL was executed 20 times but would show me Row Source Operations for *only 1* of those 20 executions. That 1 execution might be the case where it fetched 200 rows in 5seconds.
Oracle might very well have used the same execution plan for all 20 executions but, because of the nature of the underlying data, it might be suboptimal for those 2 occurrences which took 240seconds and 180seconds each. Quite possible, right ?
tkprof's listing of Row Source Operations does not show me all 20 execution.
However, the raw trace file would show 20 sets of STAT# lines where I can read the Row Source Operations from. {This is assuming that the cursor is closed after every execution. If the cursor doesn't get closed, STAT# lines are not placed in the trace file}.
Such a case would be where tkprof is "lying" to me -- but it is really only providing me partial data. {Although it still says that the summed execution time for the 20executions was 410seconds (240+180+18*5), the Row Source Operations details are for a 5seconds execution. The minimum time was 5seconds, the maximum was 240seconds. This (min/max execution time) information, too, is not available in tkprof.

2. During the course of the execution of the application server program, a developer drops an index on a key table accessed by the one (of the 30 !) SQL statement that I am interested in (he could have dropped the index when another SQL was being executed).
Say, he dropped the index after the fifth execution. The next 15 executions would have a different Execution Plan from the first 5 executions. Yet, tkprof will NOT tell me that there were at least 2 different Execution Plans. I can discern that from the "Misses in library cache" information but the Row Source Operation section would show me only 1 of the (at least 2 different) Execution Plans. So, once again, tkprof is "lying" in that it is not presenting the complete information.

What would I do in such cases ? I would split up the trace file into, maybe, 20 different pieces and run a tkprof on each piece. (Assuming that the cursor was closed each of the 20 times), I can see all 20 sets of Row Source Operation details now ! For the first case, I can see in one of the 20 trace files that the SQL took 240seconds and Row Source Operations would show me where (which index scan , table fetch, join operation etc) it spent the most time !

This leads to the question : How can I split the trace file ? Don't I have to maintain the "consistency" of the trace file ? Shouldn't the header information be repeated in all 20 pieces ?

NO, I can slice the trace file (horizontally of course, not vertically) at any point and still run tkprof. If, of course, I slice it incorrectly such that the WAIT events and/or Row Source Operations for an SQL execution are not in the same "slice" (piece), tkprof cannot tell me the correct (or would actually present incorrect) Wait times / Row Source operations. However, if I slice the file correctly (forget the trace file header, it is not important to me), I can get seperate tkprofs.

22 March, 2008

Backup the Online Redo Logs ?

On a recent oracle forum dicussion which started of with the title "Multiplex redo log files?" but really with the question "Is it a normal operation to backup redo log files as well ?", I put in a response to the "normal" assertion to "never backup redo logs".

There's probably no documentation that "says to include online redo logs as part of a backup".
However, there are situations where you _can_ (or even would want to) include your redo logs in a backup. Note that I say "a backup", not "backups". A specific, adhoc, backup may include redo logs.

1. I take a cold backup of a database and restore it to another server. I do not wish to do a RESETLOGS (why, I ask "do I HAVE to resetlogs ?"). Then, I include my redo logs in my cold backup and restore.

2. I am testing backup and recovery scenarios. I take a cold backup of the whole database as a "reference point" in all my various recovery tests.

3. I am attempting a recovery of a production database to verify data at different points in time. I know that I might have to also do a resetlogs in between, change the incarnation id, maybe even change the db_name etc. I want to be able to revert to any restored image during the few rounds of restore-and-verify-data. Those intermediate backups would well be cold backups with redo logs (after resetlogs ?)

4. I am remotely advising a DBA on recovery of a crashed database. I cannot be sure that he has verified all the archivelogs, got the current log sequence numbers etc. I know that it is likely that one or more of the online redo logs hadn't yet been archivedand I would be able to "simulate" them as archived logs during the recovery -- but only if I do ensure that the redo logs are backed up.

5. The last one or two archivelogs had not got included in the tape backup. However, since I know that the database has 4 or 5 or 6 online redo logs, those last two are still available "on-disk" in the form of "online" redo logs (which might very well be raw devices, just to make things more complicated), which haven't been "lost" in the filesystem crash.


Bottom Line : As a "general" rule, it is not necessary to backup online redo logs.As a "general" piece of advise, it _can_ be dangerous to blindly backup online redologs, even in cold backups.However, there can be *specific* cases where I know I would want to backup the online redo logs.
Remember that "online redo logs" doesn't mean exactly the same as "CURRENT" or "ACTIVE".

21 March, 2008

Rebuilding Indexes - When and Why ?

As a followup to my earlier post "Rebuilding Indexes", here are some conditions and reasons when you would [or may need to] rebuild indexes.

1. You've done a very large purge of records on a table that is frequently undergoing FullTableScans. That table is a candidate for a Table Rebuild. An Index Rebuild, obviously follows. [do I hear cries of "foul" ? That this is not a reason for Index Rebuilds by themselves ?. Well I've never said that indexes, by themselves, have their own reasons to be rebuilt. If you see my earlier post, I've amply qualified rebuilding -- rebuilding follows from dependencies. A Table Rebuild is a good cause].

2. An index that suffers inserts of monotonically increasing values but does not see large purges of "left-most" data. Such behaviour would leave you with index blocks that are not reused. [This is different from the case where deletes do cause index blocks to be reused simply because *all* the key values in the candidate blocks are deleted, placing the blocks on the free list]. Let's say you have an index on a "ACTION_ID" or "ACTION_TIMESTAMP" of an Audit / Logging table. For such a table, you do not purge ALL records older than "N" days during your regular purge cycles. Some records which are indicators of key actions / critical updates [and suspicious actions] may be excluded from your purge. [If you did purge all records older than "N" days, the index blocks would be reusable.]

3. You have an index that is growing large and also has a very large key size and you also have a daily / frequent batch job inserting very many rows, each insert [being regular DML] having to "flood" the buffer cache with dirty index blocks. Logically, a DROP, INSERT, REBUILD would be preferred. However, this may be a PK or Unique Index and you do not wish to drop it. An Insert with an APPEND hint causing deferred index maintenance is not yet in place.

4. Experience with certain indexes has proven that Key Daily / Fortnightly Reports perform better, within very tight SLAs, after Index Rebuilds in Weekly / Monthly cycles.
"Been there, done that" is what many DBAs would respond. This is faster than redesigning a "COTS" schema and modifying the application code -- which would be part of your normal tuning exercise.

5. One key consideration is that the "perceived" cost of Index Block Splits is not significant.

17 March, 2008

Rebuilding Indexes

Conventional "modern" ("modern" meaning "developed between 5 and 10 years ago ?) wisdom is that indexes in Oracle, generally, do not need to be rebuilt. This sometimes takes more strident expressions as "indexes never need to be rebuilt" . At times this position is expressed with the same passion as current opinions about BCHRs and RAID-5. However, saner opinions do qualify the assertion.

I put it that there are situations and systems / implementations where, not only is it necessary and meaningful to rebuild indexes, but it makes sense to rebuild them frequently. Such a statement from a DBA who has not gained respect in the community invariably meets responses ranging from "no, you do not need to rebuild indexes" to "tell me why you think that you need to rebuild indexes" to "this is a [false] silver bullet to performance issues".

The real problem with the whole idea of rebuilding indexes is that a) either the new DBA simplifies this to mean "rebuild all [or most] [or largest] indexes frequently [weekly/monthly]" or b) the expert assumes that the DBA has already made this simplification. Simply because there is inadequacy in the message being communicated by either party.

How I would express the proper statements about rebuilding indexes would be something like :

{I will be explaining my statements in updates to this posting and/or future postings.
I've posted my first set of updates as "Rebuilding Indexes - When and Why ?"}



If anyone asks if indexes should be rebuilt or even says that he
rebuilds indexes "for performance", the proper response,in my
opinion, should be a measured "it depends".
To someone saying "whether indexes should be rebuilt" we should
respond : "it depends", "possibly".
To someone saying ""we rebuild indexes and get better performance"
we should respond "possibly, depending on the characteristics of
the data and how it is inserted and purged, and what queries are
executed".

Rebuilding indexes DOES correct performance problems.
Even regularly rebuilding indexes.
Provided you know which indexes need to be rebuilt and when they
need to be rebuilt.
(The correct answers are "Not ALL indexes, but specific indexes
with certain characteristics", "On the occurrence of the
triggering event {eg a large purge as is done by monthly /
quarterly batch jobs even in OLTP systems} ").

The DBA might have already "been there, done that".
He has *proven* to himself that he gets a performance improvement.

If you have a system where batch runs must finish within a
specific time (have to meet an SLA) and you find that rebuilding
indexes after large purges actually does bring batch jobs within
the SLA you would measure that against the "trickle" effect of
index block splits during normal oltp transactions.

As a rule, I, too, dislike a general rebuild of indexes unless the
requirements above are met.
However, when the CFO / Upper Management paying for the hardware,
software and DBA salaries find that you cannot meet your SLA until
and unless you rebuild indexes once a month, they wouldn't be
interested in knowing about the internals of index structures and
the "slow poision" ("death by a thousand cuts") impact of block
splits.
Heck, the the system get's resurrected every month by rebuilding
the *correct* indexes. That is what matters.

Index block splits are for DBAs and computer science classes.

On the other hand, if you have no SLA commitments and/or you really
do not have a system with performance constraints, an index
rebuild makes no sense.

So, to the list of candidate scenarios where index rebuilds can be
useful, I would add SLAs. And that is what does happen with
complex ERP systems (eg like the ones the OP mentioned) from 3rd
party vendors -- those there you cannot modify the programs.

02 March, 2008

ALTER TABLE ... SHRINK SPACE

It took me some time (as I built the test and validation scripts iteratively) to setup a test to evaluate the ALTER TABLE ... SHRINK SPACE command available in Oracle 10g. I ran these tests (see the document) on 32-bit 10.2.0.3 on Windows XP on my home PC.
(I couldn't extend the tests to larger sizes as "testing the tests" themselves would take signficant time on my PC !).
It seems as if, besides, reorganising the usage of blocks within a table, the command can also identify and drop extents (seems to have been intelligent enough to drop the last 3 extents although rows had been deleted from random blocks across multiple extents).

Update : There is an Oracle Bug logged for the high Undo and Redo generated by an ALTER TABLE ... SHRINK SPACE command. Bug #3888229 "HUGE REDO AND UNDO GENERATED DURING A TABLE SHRINK OPERATION. "

Update 10-Mar-08 : I have added Block, Redo and Undo statistics for the SHRINK operation. These indicate signficant activity.
I've also run a test with Uniform Size 128K instead of AutoAllocate. At the current table size, there is no significant difference.
The document (link above) has been updated.