01 December, 2010

Using V$SESSION_LONGOPS

I have noticed a number of responses on forums suggesting that V$SESSION_LONGOPS can be used to monitor long running queries and gather statistics execution. Also, a similar response to a previous blog post.

As I have pointed out in the forums posting, V$SESSION_LONGOPS reports *operations*, not Queries and Sessions. A query or a PLSQL block can consist of multiple operations. Even a Nested Loop Join can manifest as multiple operations where the driven table is scanned using FullTableScan repeatedly.
V$SESSION_LONGOPS reports each operation separately, not the whole SQL that caused the multiple operations.

In this example, below, I show how any computation of "how much of the gather statistics has been done" and a "projection" of total time based on V$SESSION_LONGOPS would be misleading.

In one session, I run this SQL from approx 06:45:09 to approx 07:02:52 :

SQL> desc STORE_LIST
Name Null? Type
------------------------------------------------------------------------ -------- -------------------------------------------------
STORE_ID NOT NULL NUMBER
COUNTRY VARCHAR2(30)
STORE_TYPE VARCHAR2(19)
REGISTRATION_DATE DATE

SQL>

SQL> exec dbms_stats.gather_table_stats(USER,'STORE_LIST',estimate_percent=>100);

SQL> select num_rows, sample_size, blocks, blocks*8192/1048576 Size_MB from user_tables where table_name = 'STORE_LIST';

where the results are :

PL/SQL procedure successfully completed.

Elapsed: 00:17:43.17

NUM_ROWS SAMPLE_SIZE BLOCKS SIZE_MB
---------- ----------- ---------- ----------
110327040 110327040 507160 3962.1875


So, I ran GATHER_TABLE_STATS on a table with 110million rows and 3,962MB in size from 06:45:09 to 07:02:52.

I had another session polling V$SESSION_LONGOPS approximately every 30seconds. The query was :

select sid, opname, target, sofar, totalwork,
units, to_char(start_time,'HH24:MI:SS') StartTime,
time_remaining, message, username
from v$session_longops
where sofar != totalwork
order by start_time
/



These are some of the results :

At 06:45:41
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Table Scan
HEMANT.STORE_LIST 49110 507160 Blocks 06:45:09
289
Table Scan: HEMANT.STORE_LIST: 49110 out of 507160 Blocks done
HEMANT

If I had extrapolated from here, I would have expected the Gather Stats to take another 289seconds -- i.e. run till 06:49:30.

Another snapshot :

At 06:47:57
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Table Scan
HEMANT.STORE_LIST 272042 507160 Blocks 06:45:09
144
Table Scan: HEMANT.STORE_LIST: 272042 out of 507160 Blocks done
HEMANT

This indicated that 53% (272042 of 507160 blocks) had been read and the estimated end time would be 06:50:21 now (06:47:57 plus 144seconds).

A subsequent snapshot :

At 06:49:51
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Table Scan
HEMANT.STORE_LIST 453007 507160 Blocks 06:45:09
34
Table Scan: HEMANT.STORE_LIST: 453007 out of 507160 Blocks done
HEMANT

The expected end time was now 06:50:34 (06:49:51 plus 34seconds). (Notice that the expected end time is slowly creeping forward ... but we can live with that -- we assume that Oracle is "refining" it's estimate).

All of a sudden, something changed in the next snapshot :

At 06:50:42
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Table Scan
HEMANT.STORE_LIST 63023 507160 Blocks 06:50:21
148
Table Scan: HEMANT.STORE_LIST: 63023 out of 507160 Blocks done
HEMANT

Why did "SOFAR" (i.e. the number of blocks read so far) suddenly shrink (or reset ?) from 453,007 to 63,023 ? The expected end time is now 06:53:10 (06:50:42 plus 148seconds).

Here's the reason :
The SQL statement that had been running since 06:45:09 was :

SQL_ID : 5urpb1azwjnwv
select /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitor
ing no_substrb_pad */count(*), sum(sys_op_opnsize("STORE_ID")), count("COUNTRY"), count(distinct "COUNTRY"), sum(sys_op_opnsize(
"COUNTRY")), substrb(dump(min("COUNTRY"),16,0,32),1,120), substrb(dump(max("COUNTRY"),16,0,32),1,120), count("STORE_TYPE"), coun
t(distinct "STORE_TYPE"), sum(sys_op_opnsize("STORE_TYPE")), substrb(dump(min("STORE_TYPE"),16,0,32),1,120), substrb(dump(max("S
TORE_TYPE"),16,0,32),1,120), count("REGISTRATION_DATE"), count(distinct "REGISTRATION_DATE"), substrb(dump(min("REGISTRATION_DAT
E"),16,0,32),1,120), substrb(dump(max("REGISTRATION_DATE"),16,0,32),1,120) from "HEMANT"."STORE_LIST" t

Sometime between 06:49:50 and 06:50:42 (at 06:50:21, according to the V$SESSION_LONGOPS output), the running SQL changed to :

SQL_ID : g2nvjs2q08rwh
select min(minbkt),maxbkt,substrb(dump(min(val),16,0,32),1,120) minval,substrb(dump(max(val),16,0,32),1,120) maxval,sum(rep) sum
rep, sum(repsq) sumrepsq, max(rep) maxrep, count(*) bktndv, sum(case when rep=1 then 1 else 0 end) unqrep from (select val,min(b
kt) minbkt, max(bkt) maxbkt, count(val) rep, count(val)*count(val) repsq from (select /*+ no_parallel(t) no_parallel_index(t) db
ms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad */"STORE_ID" val, ntile(254) o
ver (order by "STORE_ID") bkt from "HEMANT"."STORE_LIST" t where "STORE_ID" is not null) group by val) group by maxbkt order b
y maxbkt

So, the GATHER_TABLE_STATS actually runs multiple, separate SQL statements ! In this case, V$SESSION_LONGOPS was reset by a Fresh FullTableScan initiated for the new SQL !

Further down the timeline, I see :

At 06:52:37
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Table Scan
HEMANT.STORE_LIST 457529 507160 Blocks 06:50:21
15
Table Scan: HEMANT.STORE_LIST: 457529 out of 507160 Blocks done
HEMANT

But at 06:53:28 :
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Sort/Merge
15523 166760 Blocks 06:52:52
341
Sort/Merge: : 15523 out of 166760 Blocks done
HEMANT


So, the Table Scan operation that began at 06:50:21 was succeeded by a Sort/Merge operation that began at 06:52:52. The estimated end time was now 06:59:09 (06:53:28 plus 341seconds).

However, monitoring the SQL for the session :

At 06:53:27 :
SQL_ID : g2nvjs2q08rwh
select min(minbkt),maxbkt,substrb(dump(min(val),16,0,32),1,120) minval,substrb(dump(max(val),16,0,32),1,120) maxval,sum(rep) sum
rep, sum(repsq) sumrepsq, max(rep) maxrep, count(*) bktndv, sum(case when rep=1 then 1 else 0 end) unqrep from (select val,min(b
kt) minbkt, max(bkt) maxbkt, count(val) rep, count(val)*count(val) repsq from (select /*+ no_parallel(t) no_parallel_index(t) db
ms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad */"STORE_ID" val, ntile(254) o
ver (order by "STORE_ID") bkt from "HEMANT"."STORE_LIST" t where "STORE_ID" is not null) group by val) group by maxbkt order b
y maxbkt

Again at 06:54:59 :
SQL_ID : g2nvjs2q08rwh
select min(minbkt),maxbkt,substrb(dump(min(val),16,0,32),1,120) minval,substrb(dump(max(val),16,0,32),1,120) maxval,sum(rep) sum
rep, sum(repsq) sumrepsq, max(rep) maxrep, count(*) bktndv, sum(case when rep=1 then 1 else 0 end) unqrep from (select val,min(b
kt) minbkt, max(bkt) maxbkt, count(val) rep, count(val)*count(val) repsq from (select /*+ no_parallel(t) no_parallel_index(t) db
ms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad */"STORE_ID" val, ntile(254) o
ver (order by "STORE_ID") bkt from "HEMANT"."STORE_LIST" t where "STORE_ID" is not null) group by val) group by maxbkt order b
y maxbkt


So, the SQL that began at 06:50:21 was still continuing at 06:54:59 but the Operation had changed at 06:52:52 and so V$SESSION_LONGOPS was showing a new operation with a new start time and a new estimated end time.

We have, so far, seen :
1. SQL has changed, resetting V$SESSION_LONGOPS
2. Operation within an SQL has changed (from Table Scan to Sort/Merge), resetting V$SESSION_LONGOPS again

Subsequently, from the snapshot at 06:56:44, the "TIME_REMAINING" is NULL :

At 06:56:44
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Sort/Merge
167882 166760 Blocks 06:52:52

Sort/Merge: : 167882 out of 166760 Blocks done
HEMANT

At 06:57:35
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Sort/Merge
210688 166760 Blocks 06:52:52

Sort/Merge: : 210688 out of 166760 Blocks done
HEMANT

At 06:58:57
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Sort/Merge
255326 166760 Blocks 06:52:52

Sort/Merge: : 255326 out of 166760 Blocks done
HEMANT


You may have also noticed that "SOFAR" is actually well in excess of "TOTALWORK". Seemingly, Oracle has been doing more work than the total work for this SORT/MERGE operation ! (Why ? !)

Then, the information changes again :

At 07:00:22
SID OPNAME
---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
43 Sort Output
150657 162822 Blocks 06:59:27
4
Sort Output: : 150657 out of 162822 Blocks done
HEMANT

The view now shows that a Sort Output operation began at 06:59:27.
The SQL statement was still the same :

At 07:02:21
SQL_ID : g2nvjs2q08rwh
select min(minbkt),maxbkt,substrb(dump(min(val),16,0,32),1,120) minval,substrb(dump(max(val),16,0,32),1,120) maxval,sum(rep) sum
rep, sum(repsq) sumrepsq, max(rep) maxrep, count(*) bktndv, sum(case when rep=1 then 1 else 0 end) unqrep from (select val,min(b
kt) minbkt, max(bkt) maxbkt, count(val) rep, count(val)*count(val) repsq from (select /*+ no_parallel(t) no_parallel_index(t) db
ms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad */"STORE_ID" val, ntile(254) o
ver (order by "STORE_ID") bkt from "HEMANT"."STORE_LIST" t where "STORE_ID" is not null) group by val) group by maxbkt order b
y maxbkt


So, within this SQL, there have been 3 different Operations reported by V$SESSION_LONGOPS.


Within a single GATHER_TABLE_STATS call, we have seen 2 SQLs and at least 4 different Operations, each Operation being a fresh entry in V$SESSION_LONGOPS.
And I have not even thrown in the complication of Column Statistics being gathered by METHOD_OPT being "FOR ALL COLUMNS SIZE AUTO" being the default. Since I haven't run any queries against the table, the "SIZE AUTO" finds nothing in SYS.COL_USAGE$ and creates no Histograms :

SQL> l
1 select column_name, num_distinct, num_nulls, histogram, num_buckets from user_tab_col_statistics
2* where table_name = 'STORE_LIST'
SQL> /

COLUMN_NAME NUM_DISTINCT NUM_NULLS HISTOGRAM NUM_BUCKETS
------------------------------ ------------ ---------- --------------- -----------
STORE_ID 306464 0 NONE 1
COUNTRY 42 0 NONE 1
STORE_TYPE 44 0 NONE 1
REGISTRATION_DATE 5896 0 NONE 1

SQL>


Had Oracle decided to gather Histograms on the columns, there would have been many more SQLs and many more Operations from the GATHER_TABLE_STATS call. Reading V$SESSION_LONGOPS would have been very confusing as each new Operation would enter with a new STARTTIM ! And there are no Indexes to cascade gather statistics, else indexes would have cause more SQLs, more Operations, more entries V$SESSION_LONGOPS.


Therefore, it is clear that any reading of VSESSION_LONGOPS for the GATHER_TABLE_STATS on the STORE_LIST table is misleading as the (SOFAR and TOTALWORK) and (STARTTIM and TIME_REMAINING) figures are reset with each new entry.
This is the complete listing of V$SESSION_LONGOPS entries : 4 distinct operations for one GATHER_TABLE_STATS :

SQL> l
1 select to_char(sysdate,'DD-MON HH24:MI:SS') Collection_DateStamp, sid, opname, target, sofar, totalwork,
2 units, to_char(start_time,'HH24:MI:SS') StartTime,
3 time_remaining, message, username
4 from v$session_longops
5 where 1=1
6* order by start_time
SQL> /

COLLECTION_DATESTAMP SID OPNAME
------------------------ ---------- ----------------------------------------------------------------
TARGET SOFAR TOTALWORK UNITS STARTTIM
---------------------------------------------------------------- ---------- ---------- -------------------------------- --------
TIME_REMAINING
--------------
MESSAGE
------------------------------------------------------------------------------------------------------------------------------------
USERNAME
------------------------------
01-DEC 08:03:44 43 Table Scan
HEMANT.STORE_LIST 507160 507160 Blocks 06:45:09
0
Table Scan: HEMANT.STORE_LIST: 507160 out of 507160 Blocks done
HEMANT

01-DEC 08:03:44 43 Table Scan
HEMANT.STORE_LIST 507160 507160 Blocks 06:50:21
0
Table Scan: HEMANT.STORE_LIST: 507160 out of 507160 Blocks done
HEMANT

01-DEC 08:03:44 43 Sort/Merge
166760 166760 Blocks 06:52:52
0
Sort/Merge: : 166760 out of 166760 Blocks done
HEMANT

01-DEC 08:03:44 43 Sort Output
162822 162822 Blocks 06:59:27
0
Sort Output: : 162822 out of 162822 Blocks done
HEMANT


SQL>



The first entry that appeared in V$SESSION_LONGOPS was *not* indicative of all the operations in the GATHER_TABLE_STATS. Also, I could never be sure if all operations had completed, simply by monitoring V$SESSION_LONGOPS. I kept seeing new operations appearing. (Had Column Histograms and Index statistics been gathered, there would have been that many more operations --- Column Histograms are particularly dicey when SAMPLE_SIZE is 'AUTO' and METHOD_OPT is 'FOR ALL COLUMNS SIZE AUTO' because Oracle can keep running against the same column repeatedly, each time with a different sample size).

.
.

NOTE : Another example of misreading V$SESSION_LONGOPS for a DML (UPDATE) statement is published here.

.
.
.

Most Popular Posts - Nov 10

Blogger's "Stats" feature shows that the 3 most popular posts in the past 30 days have been :

1. AUTOEXTEND ON Next Size : 325 pageviews

.
.
.