18 September, 2016

SQL*Net Message Waits

Here are some extracts from an 11.2.0.4 AWR Report  I ran a simulated workload on this server for about 40minutes and generated this report.  I understand that some DBAs may be misinterpreting SQL*Net message time.

Host Name        Platform                         CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
ora11204         Linux x86 64-bit                    2     2       1       3.04

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:       158 18-Sep-16 21:42:34        36       1.0
  End Snap:       159 18-Sep-16 22:23:01        33       1.0
   Elapsed:               40.45 (mins)
   DB Time:               42.04 (mins)

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0              10.2      0.12      0.07
              DB CPU(s):               0.0               0.1      0.00      0.00
      Redo size (bytes):           2,523.6          24,793.8
  Logical read (blocks):              41.7             409.6


Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            Tota    Wait   % DB
Event                                 Waits Time Avg(ms)   time Wait Class
------------------------------ ------------ ---- ------- ------ ----------
log file sync                         1,095 2508    2291   99.5 Commit
db file sequential read                 162 37.5     231    1.5 User I/O
DB CPU                                      19.1             .8
SQL*Net message to client            31,579   .1       0     .0 Network
Disk file operations I/O                103    0       0     .0 User I/O


Wait Classes by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                        Avg             Avg
                                        Total Wait     Wait   % DB   Active
Wait Class                  Waits       Time (sec)     (ms)   time Sessions
---------------- ---------------- ---------------- -------- ------ --------
Commit                      1,095            2,509     2291   99.5      1.0
System I/O                 12,899            2,210      171   87.6      0.9
User I/O                    1,866               38       20    1.5      0.0
DB CPU                                          19              .8      0.0
Network                    33,651                9        0     .4      0.0


Foreground Wait Class                       DB/Inst: ORCL/orcl  Snaps: 158-159
-> s  - second, ms - millisecond -    1000th of a second
-> ordered by wait time desc, waits desc
-> %Timeouts: value of 0 indicates value was < .5%.  Value of null is truly 0
-> Captured Time accounts for        101.7%  of Total DB time       2,522.36 (s)
-> Total FG Wait Time:             2,546.18 (s)  DB CPU time:          19.14 (s)


                                                                  Avg
                                      %Time       Total Wait     wait
Wait Class                      Waits -outs         Time (s)     (ms)  %DB time
-------------------- ---------------- ----- ---------------- -------- ---------
Commit                          1,095     0            2,509     2291      99.5
User I/O                          276     0               38      136       1.5
DB CPU                                                    19                0.8
Network                        31,579     0                0        0       0.0
Concurrency                        21     0                0        0       0.0

Foreground Wait Events                     DB/Inst: ORCL/orcl  Snaps: 158-159
-> s  - second, ms - millisecond -    1000th of a second
-> Only events with Total Wait Time (s) >= .001 are shown
-> ordered by wait time desc, waits desc (idle events last)
-> %Timeouts: value of 0 indicates value was < .5%.  Value of null is truly 0


                                                             Avg
                                        %Time Total Wait    wait    Waits   % DB
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
log file sync                     1,095     0      2,509    2291      4.4   99.5
db file sequential read             162     0         37     231      0.7    1.5
SQL*Net message to client        31,579     0          0       0    127.9     .0
Disk file operations I/O            103     0          0       0      0.4     .0
latch: shared pool                    2     0          0       4      0.0     .0
direct path sync                      2     0          0       2      0.0     .0
db file scattered read                6     0          0       0      0.0     .0
jobq slave wait                   5,522   100      2,770     502     22.4
SQL*Net message from clien       31,577     0      2,404      76    127.8


Wait Event Histogram                       DB/Inst: ORCL/orcl  Snaps: 158-159
-> Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000
-> % of Waits: value of .0 indicates value was <.05%; value of null is truly 0
-> % of Waits: column heading of <=1s is truly <1024ms>1s is truly >=1024ms
-> Ordered by Event (idle events last)

                                                    % of Waits
                                 -----------------------------------------------
                           Total
Event                      Waits  <1ms ms="" s="">1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
ARCH wait on ATTACH           37                                      97.3   2.7
ARCH wait on DETACH           37 100.0
Disk file operations I/O     920  99.9                            .1
LGWR wait for redo copy       54 100.0
Parameter File I/O           640 100.0
SQL*Net break/reset to cli     6 100.0
SQL*Net message to client  33.6K 100.0
...
SQL*Net message from clien 34.2K  82.7   9.5   2.7   1.6    .7   2.4    .3    .0


Other Instance Activity Stats              DB/Inst: ORCL/orcl  Snaps: 158-159
-> Ordered by statistic name

Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
SQL*Net roundtrips to/from clien             31,579           13.0         127.9



Should I be worried about the SQL*Net message waits (to client, from client) ? How should I interpret them ?  I have more than 30K SQL*Net messages and roundtrips.  Apparently, messages from client are very slow -- at 76ms/message.  Apparently, round-trips are very slow -- at 13 round-trips per second.  Here's a hint :  In this workload that I created, all the client sessions were on the database server -- so  you shouldn't be expecting such large wait times !

SQL*Net message to client waits are not really measured in terms of the time it took for the message to reach the client.  Oracle doesn't know how long the transmission took.  It only knows the time taken to put the message onto the TCP stack on the database server itself.  Subsequent network transmission time is unknown. That is why SQL*Net message to client will always be an extremely low figure in terms of time -- because it isn't a true measure of elapsed time sending a message to a client.

SQL*Net message from client is not  just the time spent on the network.  It is the time between the last message to the client upto the next message from the client.  Thus, it also includes client overheads (like "think time", CPU time, network stack on the client) besides transmission over the network.  In most cases, the major component of SQL*Net message from the client is client "think time" -- how long the client spent before formatting and sending the next SQL call to the database server.   This should also mean that it is not always true that SQL*Net message from client is an idle event.

If I have a single client that is either or some or all of :
a. Loading data in sets of rows (e.g. an ETL client)
b. Extracting data in sets of rows
c. Retrieving results and formatting the results for presentation to a user
b. Running a batch job that majorly consists of SQL calls, not PLSQL
the SQL*Net message from client is majorly the time spent by the client  (--- unless you really have a high latency network.).  In most such cases, when tracing the individual session, this wait event is NOT an idle event.  However, when reading an AWR, you cannot isolate such a session from the other sessions that are doing a mix of activity -- some with real interactive end-users, some sending periodic "heart-beat" messages, some completely idle waiting for a request from a user.  In this AWR report, there are a mix of clients with different "think-times", some completely idle for 40minutes.  We can't identify them in the AWR report.

Can you use the SQL*Net roundtrips to/from client figure from the AWR ?  Not if you have a mix of different clients doing different forms of activity.  Idle clients will have very few roundtrips in the 40minutes while clients extracting data row-by-row (not in PLSQL, but SQL or some other client like Java) would have a high number of roundtrips.  So, you can't separate the two types of behaviour in an AWR.

If you are really concerned about identifying SQL*Net overheads and/or round-trips, you should *trace* the specific individual session of interest and extract figures from the trace file.
.
.
.


1 comment:

Foued said...

Thanks Hemant for this share,
Foued