Search My Oracle Blog

Custom Search

07 December, 2014

Statistics on this blog

I began this blog on 28-Dec-2006.  For the 8 years 2007 to 2014, I have averaged 56 posts per year.  Unfortunately, this year, 2014 has produced the fewest posts -- 40 including this one.  This includes the "series" on Grid / ASM / RAC and the series on StatsPack / AWR.

2011 was my most prodigious year -- 99 posts.

There were 8,176 page views in July 2007.  To date, there have been more than 930thousand page views on this blog.  By month, the peak count has been for March 2012 -- 24,346 page views.

My largest viewer counts are from USA, India, UK, Germany and France.  www.google.com has been the largest source of traffic to this blog.

.
.
.



02 December, 2014

StatsPack and AWR Reports -- Bits and Pieces -- 4

This is the fourth post in a series.

Post 1 is here.
Post 2 is here.
Post 3 is here.

Buffer Cache Hit Ratios

Many novice DBAs may use Hit Ratios as indicators of performance.  However, these can be misleading or incomplete.

Here are two examples :

Extract A: 9i StatsPack

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer  Hit   %:   99.06

It would seem that with only 0.94% of reads being physical reads, the database is performing optimally.  So, the DBA doesn't need to look any further.  

Or so it seems.

If he spends some time reading the report, he also then comes across this :

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                     % Total
Event                                               Waits    Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
db file sequential read                           837,955       4,107    67.36
CPU time                                                        1,018    16.70
db file scattered read                             43,281         549     9.00



                                                                   Avg
                                                     Total Wait   wait    Waits
Event                               Waits   Timeouts   Time (s)   (ms)     /txn
---------------------------- ------------ ---------- ---------- ------ --------
db file sequential read           837,955          0      4,107      5    403.3
db file scattered read             43,281          0        549     13     20.8

Physical I/O is a significant proportion (76%) of total database time.  88% of the physical I/O is single-block  reads ("db file sequential read").  This is where the DBA must identify that tuning *is* required.

Considering the single block access pattern it is likely that a significant proportion are index blocks as well.  Increasing the buffer cache might help cache the index blocks.


Extract B : 10.2 AWR

Instance Efficiency Percentages (Target 100%)
Buffer Nowait %:99.98Redo NoWait %:100.00
Buffer Hit %:96.43In-memory Sort %:99.99
Library Hit %:97.16Soft Parse %:98.16
Execute to Parse %:25.09Latch Hit %:99.85
Parse CPU to Parse Elapsd %:89.96% Non-Parse CPU:96.00

The Buffer Hit Ratio is very good.  Does that mean that I/O is not an issue ?

Look again at the same report 

Top 5 Timed Events
EventWaitsTime(s)Avg Wait(ms)% Total Call TimeWait Class
CPU time147,59342.3
db file sequential read31,776,67887,659325.1User I/O
db file scattered read19,568,22079,142422.7User I/O
RMAN backup & recovery I/O1,579,31437,6502410.8System I/O
read by other session3,076,11014,21654.1User I/O

User I/O is actually significant.  The SQLs with the highest logical I/O need to be reviewed for tuning.

.
.
.

08 November, 2014

StatsPack and AWR Reports -- Bits and Pieces -- 3

This is the third post in this series.
Post 1 is here
Post 2 is here

Note : Some figures / details may be slightly changed / masked to hide the real source.

Identifying Unusual Events / Behaviours / Applications

Here I find a few "unusual" events and mark them out.

Extract A : 11.2 AWR

Snap IdSnap TimeSessionsCursors/Session
Begin Snap:.... 03:00:57107.7
End Snap:.... 07:00:07114.9
Elapsed:239.17 (mins)
DB Time:22.61 (mins)

Top 5 Timed Foreground Events

    EventWaitsTime(s)Avg wait (ms)% DB timeWait Class
    DB CPU1,33298.16
    SQL*Net more data to client49,7012001.50Network
    SQL*Net more data from client213,915500.34Network
    db file scattered read1,159110.08User I/O
    db file sequential read7,547100.07User I/O

    The two "SQL*Net more data" sets of waits are the unusual events.
    The Time on SQL*Net more data to/from client is negligible isn't it ?  So, should I be concerned ?  Over a 4 hour period, only 20seconds were on "SQL*Net more data to client".  Time on "SQL*Net more data from client" is much lower at a total time of 5seconds only.  So "Time based" tuning would ignore these two waits.

    Foreground Wait Events

    EventWaits%Time -outsTotal Wait Time (s)Avg wait (ms)Waits /txn% DB time
    SQL*Net more data to client49,701020037.781.50
    SQL*Net more data from client213,915050162.620.34
    db file scattered read1,1580110.880.08
    db file sequential read7,5500105.740.07







    SQL*Net message to client652,102000495.890.04














    SQL*Net message from client652,1020183,327281495.89

    Not that Oracle treats "SQL*Net message from client" as an idle wait so the 183,327seconds of wait time do NOT appear in the Top 5 Timed Foreground Events list.

    I would draw attention to the high number of "more data from client" waits and the correlation with the "message from client" waits.  Either extremely large SQL statements or PLSQL blocks are being submitted very frequently or row inserts with very large array sizes (number of rows per insert) are being received.  In this case, further investigation reveals an ETL loader that does bulk inserts of a number of rows per array.  If we need tuning, tuning the SDU may help.

    Similarly the "more data to client" indicates large data sets are being returned.  The numbers of columns and rows per every "send" are high.

    Instance Activity Stats (from the same AWR)

    StatisticTotalper Secondper Trans



    Requests to/from client654,73845.63497.90




    SQL*Net roundtrips to/from client654,74045.63497.91




    bytes received via SQL*Net from client1,793,072,463124,950.541,363,553.20
    bytes sent via SQL*Net to client552,048,24738,469.57419,808.55




    logical read bytes from cache762,514,227,20053,135,924.61579,858,727.91




    physical read total bytes8,772,479,488611,311.626,671,087.06



    physical write total bytes25,334,243,3281,765,420.7619,265,584.28




    redo size6,373,204,848444,117.794,846,543.61





    1.793billion bytes received in 654K SQL*Net trips is 2741bytes per trip received at 45 messages per second.  Given that it is still only 2,741bytes per trip, possibly the array size could also be tuned with the SDU and TDU.

    So, this is an AWR that doesn't call for tuning but reveals information about how the database is being used.  Large number of rows (large number of columns) being inserted and retrieved in each call.  The Performance Analyst needs to be aware of the nature of the "application" --- here it is not OLTP users but an ETL job that is the "application".  Although the database had more than a hundred sessions a very small number of sessions (possibly 4 ?) were active doing ETL and checking the status of ETL tables during this window.

    Would you care to analyze the other statistics I've listed -- bytes read and bytes written ?




    Extract B : 10.2 AWR

    Snap IdSnap TimeSessionsCursors/Session
    Begin Snap:
     00:00:072284.5
    End Snap:
     23:00:062324.3
    Elapsed:1,379.97 (mins)
    DB Time:11,543.24 (mins)

    I know.  It is really bad and most likely meaningless to get an AWR for a 24hour range.  (I'll not go into the details about why the AWR is for 24hours -- save that I did NOT ask for a 24hour AWR report).

    Top 5 Timed Events
    EventWaitsTime(s)Avg Wait(ms)% Total Call TimeWait Class
    CPU time258,10137.3
    db file sequential read62,150,655208,148330.1User I/O
    db file scattered read28,242,529141,638520.4User I/O
    RMAN backup & recovery I/O1,597,42137,137235.4System I/O
    enq: TX - row lock contention22,27634,9421,5695.0Application
    The RMAN Backup load is expected.  (Why ? Given a 24hour report, I expect RMAN to have run at least once during the day).
    For performance tuning, I would look at the "db file ... read" events and identify and analyze SQL statements and the schema.

    What is the "unusual" event here ?  It is the "enq: TX - row lock contention".  Over a period of 1,380minutes, there were 22,276 Row-Lock Contention waits.  Actually, this application does not have the same load throughout the 23hours.  Most likely, it had load for 15hours only.  So, we had 22,276 Row-Lock Contention waits over 15hours.  That translates to 1,485 waits per hour or one Row-Lock Contention wait every 2.5seconds.  Now, that is a very high frequency.  Either users are locking each other out for a very short while (1.569seconds per wait on average) or there is/are one or more jobs that run at a very high frequency and update a common "reference" table.  I won't reveal my findings here but analysis of the SQL statements indicates what the "problem" is.

    Now, should "Time based performance tuning" be worried about the 5% of time lost on these waits ?  Probably not.  But they do indicate something peculiar in the design of this system.  There are less than 250 user sessions in this OLTP database but there is/are one or more jobs that is/are locking itself every 2.5seconds -- so there is some point of serialisation occurring.  Is that job also accounting for CPU time or 'db file read' time ?  That needs further analysis.

    Both these cases show how a Performance Analyst needs to know how the database is being used.  What sort of jobs are submitted, besides OLTP users ?

    .
    .
    .



    Aggregated by orafaq.com

    Aggregated by orafaq.com
    This blog is being aggregated by orafaq.com