21 November, 2010

Some Common Errors - 7 - "We killed the job because it was hung"

Continuing the "Common Errors" series ...

I have come across occasions when a Developer or a DBA has "killed" a job only because "it was hung". If your user or Developer points you to a job that "seems to be 'hung'", as a DBA, it is your responsibility to determine the status of the job from the database instance perspective, not from the user's perspective. Besides the GUI Enterprise Manager screens, Oracle provides numerous views that you can use to monitor the job. Learn to make use of V$SESSION, V$SESS_IO, V$SESSTAT, V$SESSION_WAIT, V$SESSION_EVENT, V$ACTIVE_SESSION_HISTORY to monitor a session.

Never kill and restart a job without at least determining (or to a reasonable degree of confidence) what caused the job to appear to be 'hung' and what action you need to take to ensure that the behaviour doesn't repeat the next time the job is re-run.
If you merely kill a job and expect it to perform better when it is restarted, without determining and fixing the causes, you are likely to incur the displeasure of the user.

As a professional, one of the attributes expected of you is "being able to diagnose a cause for a symptom".

.
.
.

5 comments:

Rushikesh said...

Excellent note,will be very useful while dignosis of issues

Fahd Mirza said...

v$session_longops is also very useful and perhaps the most important view after the v$session view in determining the status of a long running session which might seem hung to the user.

Hemant K Chitale said...

V$SESSION_LONGOPS is overrated because it is not wrll understood. I would look at it only if I already knew precisely what I will see there. On this blog I have earlier published a case of how it can be very wrong to interpret.
Hemant

Surachart Opun said...

Cool.

I killed some session, they block other session after investigated.
I killed some session, they run longer and use many CPU + ....

I killed session, they hung and still pin in library cache and develop can not recompile PL/SQL code.

Hemant K Chitale said...

I prefer using the OS kill command on the server process (when using DEDICATED_SERVER) to the 'ALTER SYSTEM KILL SESSION' command.

Hemant