09 January, 2011

Rollback of Transaction(s) after SHUTDOWN ABORT

There has been a recent forums thread on what happens in terms of recovery when a database instance is restarted after a SHUTDOWN ABORT.

It is clear that "crash recovery" has to be completed before the Instance can be OPENed for use. Thus, users cannot login, queries cannot be started and DML / DDL cannot be executed until after the Roll-forward of Redo has been completed (as the OPEN is executed only after re-applying all Redo).

This is an example of "crash recovery" :

Sun Jan 09 00:50:23 2011
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
Started redo scan
Completed redo scan
read 80458 KB redo, 1721 data blocks need recovery
Started redo application at
Thread 1: logseq 490, block 52626
Recovery of Online Redo Log: Thread 1 Group 1 Seq 490 Reading mem 0
Mem# 0: /home/oracle/app/oracle/oradata/orcl/redo01.log
Recovery of Online Redo Log: Thread 1 Group 2 Seq 491 Reading mem 0
Mem# 0: /home/oracle/app/oracle/oradata/orcl/redo02.log
Recovery of Online Redo Log: Thread 1 Group 3 Seq 492 Reading mem 0
Mem# 0: /home/oracle/app/oracle/oradata/orcl/redo03.log
Completed redo application of 15.26MB
Completed crash recovery at
Thread 1: logseq 492, block 31015, scn 6082434
1721 data blocks read, 1721 data blocks written, 80458 redo k-bytes read
Sun Jan 09 00:50:27 2011
LGWR: STARTING ARCH PROCESSES
Sun Jan 09 00:50:27 2011
ARC0 started with pid=20, OS id=4173
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Thread 1 advanced to log sequence 493 (thread open)
Thread 1 opened at log sequence 493
Current log# 1 seq# 493 mem# 0: /home/oracle/app/oracle/oradata/orcl/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun Jan 09 00:50:28 2011
SMON: enabling cache recovery
Sun Jan 09 00:50:28 2011
ARC2 started with pid=22, OS id=4177
Sun Jan 09 00:50:28 2011
ARC1 started with pid=21, OS id=4175
ARC1: Archival started
ARC2: Archival started
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
Sun Jan 09 00:50:29 2011
ARC3 started with pid=23, OS id=4179
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Archived Log entry 319 added for thread 1 sequence 492 ID 0x49469b3f dest 1:
Successfully onlined Undo Tablespace 2.
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is AL32UTF8
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Sun Jan 09 00:50:34 2011
Starting background process QMNC
Sun Jan 09 00:50:34 2011
QMNC started with pid=25, OS id=4184
Completed: ALTER DATABASE OPEN


However, "crash recovery" does NOT complete "Transaction Recovery". Notice the line "SMON: enabling tx recovery" in the alert.log. This is the point when SMON starts beginning to Rollback any transactions that had not completed or not committed (or had not rolled back) before the SHUTDOWN ABORT.

Let me re-iterate : "crash recovery" is only the application of redo from redo logs to datafiles -- which include undo tablespace files. Redo Logs do not necessarily only have committed transaction. Uncommitted transactions also get written to Redo (e.g. when DBWR writes a "dirty buffer" to disk, the Redo for it has to be written to the Redo Log files before the datafile is updated --- irrespective of whether the "dirty buffer" represents a committed or uncommited update).
Therefore, Roll-forward only brings the database to (almost) the same state it was when the SHUTDOWN ABORT was issued --- including updates that had not committed.

Since Oracle's job is to guarantee database consistency and transaction consistency, any uncommitted transactions must be rolled-back. It cannot allow users to see inconsistent data when the database is OPENed. (Imagine that the incomplete transaction had managed to update a DEBIT account and this had got written to disk but the CREDIT did not go to disk --- users querying the data after the restart cannot be allowed to see only the DEBIT , the transaction must be rolled-back, the DEBIT must be undone, because the CREDIT has not been done).
This process of Rollback is "Transaction Recovery" which you see as "tx recovery" in the alert.log. In earlier version (upto some V7 version, I can't recall which exactly), the Database Instance would not OPEN until Transaction Recovery was completed. When Oracle introduced deferred transaction recovery (deferred rollback), it allowed the database instance to OPEN but still guaranteed that uncommitted data would not be seen. A user process that accesses a modified block "sees" that the ITL entry points to a transaction and, from the corresponding Undo entry (identified from the specific slot in the specific Undo segment), also "sees" that the transaction was not committed. A "rollback" has to be effected. In this case, it is the user's Server Process that actually executes the rollback.

(NOTE : In the case where neither portion of the incomplete / uncommitted transaction got written to the database files, the user process does not even "see" that the data had been modified -- any modifications in memory were lost when the database instance was SHUTDOWN ABORT. So our case of a user's Server Process having to rollback a transaction that was issued (but not completed or committed) before the SHUTDOWN ABORT happens only if the dirty buffers had been written to disk.)

Here I show a very large transaction :

SQL> select sn.name, ss.value from v$statname sn, v$mystat ss
2 where sn.statistic#=ss.statistic#
3 and sn.name in ('redo size','undo change vector size')
4 /

NAME VALUE
---------------------------------------------------------------- ----------
redo size 784
undo change vector size 136

SQL> update store_list_new set country='SYS_UPD' where country='SYS';

11100960 rows updated.

SQL> select sn.name, ss.value from v$statname sn, v$mystat ss
2 where sn.statistic#=ss.statistic#
3 and sn.name in ('redo size','undo change vector size')
4 /

NAME VALUE
---------------------------------------------------------------- ----------
redo size 5220534820
undo change vector size 1778390016

SQL>
SQL>

that is executed and completed but not commited.
(for the purpose of this demo, I allow the UPDATE to complete, however, it doesn't matter if the UPDATE hadn't even completed when the SHUTDOWN ABORT was issued --- in either case, the Rollback is after the DATABASE OPEN).
A SHUTDOWN ABORT, followed by a STARTUP is issued from a different session :

[oracle@localhost ~]$ sqlplus '/ as sysdba'

SQL*Plus: Release 11.1.0.7.0 - Production on Sun Jan 9 00:48:24 2011

Copyright (c) 1982, 2008, Oracle. All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> shutdown abort;
ORACLE instance shut down.
SQL>
SQL> startup ;
ORACLE instance started.

Total System Global Area 313860096 bytes
Fixed Size 1336232 bytes
Variable Size 272632920 bytes
Database Buffers 33554432 bytes
Redo Buffers 6336512 bytes
Database mounted.
Database opened.
SQL>

with these messages from the alert.log :

Sun Jan 09 00:48:30 2011
Shutting down instance (abort)
License high water mark = 7
USER (ospid: 4076): terminating the instance
Instance terminated by USER, pid = 4076
Sun Jan 09 00:48:31 2011
Instance shutdown complete
Sun Jan 09 00:49:29 2011
Starting ORACLE instance (normal)


The "crash recovery" appears as :

Sun Jan 09 00:50:18 2011
ALTER DATABASE MOUNT
Successful mount of redo thread 1, with mount id 1267731018
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE MOUNT
Sun Jan 09 00:50:23 2011
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
Started redo scan
Completed redo scan
read 80458 KB redo, 1721 data blocks need recovery
Started redo application at
Thread 1: logseq 490, block 52626
Recovery of Online Redo Log: Thread 1 Group 1 Seq 490 Reading mem 0
Mem# 0: /home/oracle/app/oracle/oradata/orcl/redo01.log
Recovery of Online Redo Log: Thread 1 Group 2 Seq 491 Reading mem 0
Mem# 0: /home/oracle/app/oracle/oradata/orcl/redo02.log
Recovery of Online Redo Log: Thread 1 Group 3 Seq 492 Reading mem 0
Mem# 0: /home/oracle/app/oracle/oradata/orcl/redo03.log
Completed redo application of 15.26MB
Completed crash recovery at
Thread 1: logseq 492, block 31015, scn 6082434
1721 data blocks read, 1721 data blocks written, 80458 redo k-bytes read

(the rest of the messages, upto the DATABASE OPEN have been presented earlier in this blog post).

But what happens when the user retries his UPDATE after the database instance is OPEN ?

oracle@localhost ~]$ sqlplus

SQL*Plus: Release 11.1.0.7.0 - Production on Sun Jan 9 00:51:12 2011

Copyright (c) 1982, 2008, Oracle. All rights reserved.

Enter user-name: hemant/hemant

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> update store_list_new set country='SYS_UPD' where country='SYS';


In the normal course, the entire Rollback would be done by SMON. However, what happens now is that this UPDATE statement, as it reads the table's datablocks, sees (from the ITL entries in the block header) the rows it is interested as having been updated, reads the undo slots for the transaction and identifies that the previous transaction had not committed. It has to effect a rollback of the previous transaction before it can update the same rows again !

When I trace this UPDATE run, I see a number of wait entries like :

=====================
PARSING IN CURSOR #5 len=63 dep=0 uid=184 oct=6 lid=184 tim=1294563155660560 hv=3915004330 ad='2cf70638' sqlid='ff3g2nznpnfda'
update store_list_new set country='SYS_UPD' where country='SYS'
END OF STMT
PARSE #5:c=486927,e=13830396,p=397,cr=10544,cu=1,mis=1,r=0,dep=0,og=1,plh=2037152,tim=1294563155660554
WAIT #5: nam='db file scattered read' ela= 28880 file#=15 block#=171843 blocks=5 obj#=85765 tim=1294563155690619
WAIT #5: nam='db file sequential read' ela= 874 file#=3 block#=145603 blocks=1 obj#=0 tim=1294563155691915
WAIT #5: nam='db file sequential read' ela= 437 file#=3 block#=145602 blocks=1 obj#=0 tim=1294563155692708
WAIT #5: nam='db file sequential read' ela= 29147 file#=3 block#=245 blocks=1 obj#=0 tim=1294563155722107
WAIT #5: nam='db file sequential read' ela= 15050 file#=3 block#=244 blocks=1 obj#=0 tim=1294563155739114

*** 2011-01-09 00:52:37.181
WAIT #5: nam='enq: TX - row lock contention' ela= 1384462 name|mode=1415053318 usn 16 slot="524303" sequence="3878" tim="1294563157180358"
WAIT #5: nam='db file sequential read' ela= 220 file#=3 block#=145603 blocks=1 obj#=0 tim=1294563157193428
WAIT #5: nam='db file sequential read' ela= 153 file#=3 block#=145602 blocks=1 obj#=0 tim=1294563157195194
WAIT #5: nam='db file sequential read' ela= 117 file#=3 block#=245 blocks=1 obj#=0 tim=1294563157196317

File_ID 3 is the datafile of my UNDOTBS1 Tablepace, that is the active UNDO_TABLESPACE of the Database Instance. The Server Process of the user session executing the UPDATE is reading from the Undo segments.
(Why don't I see read waits on the STORE_LIST_NEW table itself ? A large number of buffers of the table have already been read into the Buffer Cache by SMON and also when doing dynamic sampling to parse the UPDATE statement, although some blocks still have to be read from disk and do appear as waits later in the trace file).

Recall that SMON does execute a Rollback. That is why the Server Process occassionally encounter waits like these :

WAIT #5: nam='enq: TX - row lock contention'


In fact, while the UPDATE was running, when I checked for the session's waits, I found :

SQL> l
1 select blocking_session, event, count(*) from v$active_session_history where session_id=46
2 and (blocking_session is not null and event is not null)
3* group by blocking_session, event
SQL> /

BLOCKING_SESSION EVENT COUNT(*)
---------------- ---------------------------------------------------------------- ----------
34 latch: enqueue hash chains 1
11 log file switch completion 1
10 free buffer waits 1
13 enq: TX - row lock contention 1014

SQL>

SID 13 is the SMON session.

Thus, both SMON and the Server Process for the user issuing the UPDATE are attempting to Rollback entries for the target table.

If you want to monitor the Rollback being done by SMON you can see this :

SQL> select * from v$fast_start_transactions ;

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL PID CPUTIME PARENTUSN PARENTSLT
---------- ---------- ---------- ---------------- -------------- --------------- ---------- ---------- ---------- ----------
PARENTSEQ XID PXID RCVSERVERS
---------- ---------------- ---------------- ----------
8 15 3878 RECOVERING 18315 135898 24 8809 0 0
0 08000F00260F0000 0000000000000000 1


SQL>

While the Transaction is being Recovered,the STATE is "RECOVERING". UNDOBLOCKSTOTAL is the total number of Undo Blocks that have to be processed to complete the Rollback. UNDOBLOCKSDONE is the number of Undo Blocks that have been completed as of the time the V$FAST_START_TRANSACTIONS view is queried.

Later, as I query the Statistics for the UPDATE session I see :

data blocks consistent reads - undo records applied 2,968,576

This statistic shows how many undo records that the Server Process itself has applied.

I killed the UPDATE after it had run for a long time but did get a tkprof of the partial run :

SQL ID: ff3g2nznpnfda
Plan Hash: 2037152
update store_list_new set country='SYS_UPD'
where
country='SYS'


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 0.10 2 2 0 0
Execute 1 109.34 10149.31 85082 3235208 5549965 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 109.37 10149.42 85084 3235210 5549965 0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 184

Rows Row Source Operation
------- ---------------------------------------------------
0 UPDATE STORE_LIST_NEW (cr=0 pr=0 pw=0 time=0 us)
2252371 TABLE ACCESS FULL STORE_LIST_NEW (cr=3212990 pr=54987 pw=0 time=756482560 us cost=34043 size=186285490 card=10957970)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
Disk file operations I/O 1 0.00 0.00
db file sequential read 74565 1.00 311.19
db file scattered read 2781 0.50 11.11
enq: TX - row lock contention 1559 8656.42 9868.94
rdbms ipc reply 25 0.00 0.02
reliable message 21 0.00 0.03
log file switch completion 26 0.27 1.80
latch: redo allocation 5 0.02 0.03
latch: cache buffers lru chain 4 0.02 0.05
latch: enqueue hash chains 3 0.14 0.14
latch free 1 0.00 0.00
enq: TX - contention 3 0.47 1.11
latch: object queue header operation 2 0.00 0.00
latch: row cache objects 4 0.05 0.10
buffer busy waits 3 0.00 0.00
log file switch (checkpoint incomplete) 4 0.05 0.12
********************************************************************************

NOTE : *Ignore* the elapsed time and wait time. It, unfortunately, includes time spent when the database was "frozen" because the FRA, holding ArchiveLogs, was full.
You can see the very high count of "current gets" and the 'db file sequential read' waits, most of which (about 67thousand, see below) were on reading from the Undo tablespace. Also note the high number of 'enq: TX - row lock contention' waits. This was the only transaction in the database. That 'row lock contention' was with SMON doing a rollback against the same rows !

[oracle@localhost trace]$ cat orcl_ora_4234.trc |grep 'file#=3' |wc -l
67046
[oracle@localhost trace]$




Thus, as you can see, "crash recovery" does not mean that the entire database is consistent. It only means that the database is returned to the same state ("partially dirty", except for uncommitted updates that never even got written to disk) as it was when the SHUTDOWN ABORT was issued.
Complete consistency is achieved only when transactions that were incomplete / uncommitted are Rolled-back -- by SMON and/or by Server Process's for User sessions that need to read the same rows.

.
.
.

12 comments:

Anonymous said...

Thanks for wonderful article.

I would like to add some additional information stated by Jonathan Lewis
"the process that starts the instance is SQL*Plus session (technically its shadow server) - and my session will show waits for "log file sequential read" as it reads the log file to roll the database forward."

During crash rcovery,
1. Shadown server of the session which starts the database performs Rollforward(Applying redo to make the db same as before Shutdown Abort)
2. Rollback performed by SMON or User's Shadown server for User sessions that need to read the same rows.

Anonymous said...

Your article just showed me exactly what im seeing in one of my databases. Thanks a ton mate.
Dave

JAMSHER KHAN said...

Thanks Hemant for such a wonderful post. Can you please help me in clearing some doubts here.

1)Can you please explain how the commit maker which oracle write in redo stream when transaction committed help in roll forward recovery.

2)After db gets open Oracle now wants to roll back the uncommitted transaction that happen before db was abort but how oracle or smon determine
which block it need to rollback after db is open.

3)Suppose USER A running transaction for 7 mins and after 5 mins checkpoint happen in database so all the dirty buffer will be flush down to datafile
by dbwr and before the dbwr writes lgwr will write the redo buffer of this block to redo file after 7 mins USER A commit the transaction oracle will
write the commit marker in redo stream and update the undo header slot that transaction is committed. Now this undo header slot are free to use by other transaction
as previous transaction is committed. Now USER B had overwrite some of the undo entry of USER A transaction. Now in between USER C process want
to access all block modified by USER A so it will read the blocks from datafile and when it will check the block it will see some active ITL entry in it.
as when the block written to datafile transaction was active. Now from the ITL entry it will try to access the undo header slot to determine if transaction
committed or not but as the entry is overwritten by USER B what will happen to this block.

Please help me in understanding above question.

JAMSHER KHAN said...

Thanks Hemant for such a wonderful post. Can you please help me in clearing some doubts here.

1)Can you please explain how the commit maker which oracle write in redo stream when transaction committed help in roll forward recovery.

2)After db gets open Oracle now wants to roll back the uncommitted transaction that happen before db was abort but how oracle or smon determine
which block it need to rollback after db is open.

3)Suppose USER A running transaction for 7 mins and after 5 mins checkpoint happen in database so all the dirty buffer will be flush down to datafile
by dbwr and before the dbwr writes lgwr will write the redo buffer of this block to redo file after 7 mins USER A commit the transaction oracle will
write the commit marker in redo stream and update the undo header slot that transaction is committed. Now this undo header slot are free to use by other transaction
as previous transaction is commited. Now USER B had overwrite some of the undo entry of USER A transaction. Now in between USER C process want
to acccess all block modified by USER A so it will read the blocks from datafile and when it will check the block it will see some active ITL entry in it.
as when the block written to datafile transaction was active. Now from the ITL entry it will try to access the undo header slot to determine if transacton
commited or not but as the entry is overwritten by USER B what will happen to this block.

Please help me in understanding above question.

Hemant K Chitale said...

1) Roll-forward relies on Checkpoints markers that are written to the redo stream. Commit markers are for individual transactions.

2) After completing the roll-forward, Oracle scans all the Undo segments to identify transactions that do not have commits. It uses the Undo to rollback these transactions.

c)User C may get an ORA-01555 error if the slot in the undo segment header has been overwritten or the undo entries have been overwritten by user B.

Hemant

Selvakumar Nagulan said...

Hi Hemant,

For the 3rd question from Jamsheer, it's a select statement the user C would issue (USER C process want to acccess all block modified by USER A). So why does it throw ORA-01555? The process would just fetch blocks from datafile, place in the buffer and return it to the user C. What's the necessity to go look at the undo blocks?

Thanks!
Selvakumar Nagulan
orabliss.blogspot.com

Hemant K Chitale said...

Selva,
If user C needs to see the rows as before the modification, it needs to read from the undo blocks to reconstruct the before-image.

Selvakumar Nagulan said...

Hemant,

For eg.
User A issues update stmt at 12 01 am (runs for 5 mins)
User C issues select stmt against same table at 12 02 am (will be waiting for lock to be released by User A)
User A commits after update stmt at 12 06 am
User C gets the records updated by user A at 12 06 am irrespective of what records it expects (either modified or not) to maintain data integrity.
Why would user C requires the rows before modification? Doesn't it compromise data integrity? I think I'm missing something here. Please clarify!
Thanks!

Hemant K Chitale said...

Selvakumar,
You may be confusing Oracle with some other database system.
Oracle provides multi-version read-consistency.
User C *does not* wait on the lock held by A if User C is simply selecting the rows.
User C will expect to see the rows as they existed at 12:02. This is independent of whether User A committed at 12:03 or 12:06 or 12:13 or 12:59 or anytime later.
If User C's query takes a long time to run and happens to visit the block at 12:06 or 12:07 or at 12:59, he needs to see the version of the row as it existed at 12:02. His session will read the undo information for that block to "recreate" the version of the row as at 12:02.

This implementation is Data Integrity.

What you are thinking of implementing seems to be what is called "Dirty Reads".

Please look up "Multi version read consistency" in Oracle on the Internet or in Oracle's documentation.

Selvakumar Nagulan said...

Hemant,

I just confused myself with DML stmt against select stmt. Your explanation is good and clear. Thanks for your time :)

Selvakumar Nagulan.

Anonymous said...

Hello Hemant,

Really good article to clear some oracle basics.

I have doubt related to rollback of transaction.

Consider a scenario where user A is updating large set of data without commit, occupying huge amount space in undo to provide read consistency. Now due to space crunch in undo tablespace, oracle started overwriting unexpired undo, to provide space to newly running DML operations.

Now user B try to select on same table, will get ora-01555 as previous image of data set is overwritten from undo.

Now if user A wants to rollback session, how oracle will rollback it, as previous image of data is already overwritten.

Thanking you in advance!!

Regards,
Adi

Hemant K Chitale said...

Adi,

Undo for an uncommitted transaction is never overwritten. So, user "A" will never face the risk.

Hemant