25 April, 2010

AutoTune Undo

Rollback segments (aka "Undo segments") were introduced in -- V4 (corrected, as pointed out by Noons). They provide two key implementations :
a. The ability to rollback a transaction -- whether intentionally or to handle transaction / session / process / instance / server failure
b. Multi Version Read Consistency

In 9i, Oracle introduced Automatic Undo Management whereby the database instance itself adds and drops undo segments and retains undo information for a duration ("UNDO_RETENTION") specified by the DBA.

In 10g (or was it 10gR2), Oracle [quietly] added _undo_autotune. This allows the database instance to extend and override (ignore) "UNDO_RETENTION". This particularly kicks in when the datafiles for the Undo Tablespace are set to AUTOEXTEND ON.

There is very good reason for allowing the Undo Retention duration to by "dynamic". Queries that are long running while DML is concurrently being executed against the same tables can continue to expect to be able to read a consistent image -- even if they exceed the expected Undo Retention duration. Oracle, on determining that queries are running longer, also retains Undo information longer ! Smart, isn't it ?

But there's a serious limitation to this approach when you think about it. Say you have a database with 5000 tables, 100 concurrent queries and 80 concurrent transactions at any time. How will Oracle track queries against transactions that modify data being used by the queries ? Ideally it would have to track each query's start time and the list of all the tables being referenced in the query and then maintain a list of transactions against those tables and modify undo retention for those transactions only. In fact, ideally it should modify undo retention only if those transactions update the same rows that the queries are or will be accessing ! This becomes a very tall order.

So, it's "automatic"s engine takes a short cut. It tracks query durations and periodically updates itself with information about the longest running query -- irrespective of which tables the query accesses.
For all Oracle knows, the longest running query may be against a table [or set of tables] that have not been updated in the past 6 minutes or 6 hours or 6 days or 6 months and will not be updated in the next 6 ..... Therefore, truth be told, this query does NOT need undo retention -- and does not need undo retention to be extended at all. Yet, the database engine will extend undo retention for all transactions against all the other tables in the same database instance, for the duration of that query.

With some more thought, you can then see the implications of this -- undo retention grows, undo space usage increases rapidly ! Completely unnecessarily.

AutoTuned undo does make sense where queries and transactions hit the same tables (yes, I am repeating this assertion). It does not help and actually causes more adverse issues where there is no commonality between long running queries and transactions.


In this small test, I ran a long running query


select col_1, slow_access from long_running_query_table;


concurrently with these transactions (from another session) :


insert into dummy_txn_table values (i);
update dummy_txn_table set col_1 = i+1;
commit;


even with a sleep of 15seconds between each transaction.

I then monitored Undo space usage with this query :


select to_char(begin_time, 'DD-MON HH24:MI') Begin_Time, to_char(end_time, 'DD-MON HH24:MI') End_Time,
undoblks, txncount, maxconcurrency, activeblks, expiredblks, tuned_undoretention, maxquerylen
from v$undostat order by begin_time


The query ran from 00:20 to 02:47 (at which point it was "killed").
The transactions ran upto 05:53.

The v$undostat figures were :

BEGIN_TIME END_TIME UNDOBLKS TXNCOUNT MAXCONCURRENCY ACTIVEBLKS EXPIREDBLKS TUNED_UNDORETENTION MAXQUERYLEN
------------ ------------ ---------- ---------- -------------- ---------- ----------- ------------------- -----------
23-APR 00:11 23-APR 00:21 179 118 1 160 6496 900 0
23-APR 00:21 23-APR 00:31 247 228 1 160 5216 1617 777
23-APR 00:31 23-APR 00:41 263 92 1 160 3408 2219 1377
23-APR 00:41 23-APR 00:51 283 108 1 160 3384 2821 1981
23-APR 00:51 23-APR 01:01 329 149 3 160 2720 3421 2581
23-APR 01:01 23-APR 01:11 322 272 1 160 2336 4021 3181
23-APR 01:11 23-APR 01:21 315 102 1 160 2208 4622 3781
23-APR 01:21 23-APR 01:31 355 116 1 160 2072 5225 4382
23-APR 01:31 23-APR 01:41 349 94 2 160 1648 5825 4985
23-APR 01:41 23-APR 01:51 342 104 1 160 624 6425 5585
23-APR 01:51 23-APR 02:01 430 248 3 160 368 7026 6186
23-APR 02:01 23-APR 02:11 400 97 1 160 240 7626 6786
23-APR 02:11 23-APR 02:21 465 116 1 160 240 8229 7389
23-APR 02:21 23-APR 02:31 429 94 1 160 240 8830 7990
23-APR 02:31 23-APR 02:41 486 105 1 160 240 9431 8590
23-APR 02:41 23-APR 02:51 421 94 1 160 240 10031 9191
23-APR 02:51 23-APR 03:01 446 274 3 160 5248 900 0
23-APR 03:01 23-APR 03:11 469 106 2 160 2824 900 0
23-APR 03:11 23-APR 03:21 407 168 1 160 3088 900 0
23-APR 03:21 23-APR 03:31 454 97 3 160 3472 900 0
23-APR 03:31 23-APR 03:41 452 111 1 160 3088 900 0
23-APR 03:41 23-APR 03:51 485 92 1 160 3344 900 0
23-APR 03:51 23-APR 04:01 512 175 3 160 3472 900 0
23-APR 04:01 23-APR 04:11 484 99 2 160 2576 900 0
23-APR 04:11 23-APR 04:21 544 171 2 160 3328 900 0
23-APR 04:21 23-APR 04:31 579 110 2 160 2312 900 0
23-APR 04:31 23-APR 04:41 626 97 3 160 1928 900 0
23-APR 04:41 23-APR 04:51 522 104 1 160 1928 900 0
23-APR 04:51 23-APR 05:01 613 151 3 160 1920 900 0
23-APR 05:01 23-APR 05:11 565 176 1 160 2304 900 0
23-APR 05:11 23-APR 05:21 609 95 1 160 2312 900 0
23-APR 05:21 23-APR 05:31 664 134 3 160 2056 900 0
23-APR 05:31 23-APR 05:41 572 92 1 160 2824 900 0
23-APR 05:41 23-APR 05:50 536 91 1 160 2824 900 0



You can see how TUNED_UNDORETENTION kept increasing upto 02:51. After the query was killed, a large number of Undo blocks were suddenly "Expired" and TUNED_UNDORETENTION dropped back to 900 (the "default" value set at the instance level).

The query and the transactions were against completely unrelated tables. Yet, the presence of the query caused all undo for all transactions to be retained in the undo segments much longer, requiring more undo blocks !

.
.
.

11 comments:

hmartinezlopez said...

Any news on _undo_autotune parameter in 11g?

Hemant K Chitale said...

Hector,
I haven't come across any reference about differences in 11g. I expect that it behaves the same in 11g/11gR2.

Hemant K Chitale

Noons said...

Hemant, rollback providing multi-version and undo of transactions has been around since at least version4!
Version 6 introduced row-level locking and a few other changes designed to greatly reduce I/O.

Hemant K Chitale said...

Noons,

Thanks for the correction. I was thinking of the separation of Redo and Undo out of the BI.ORA file that came about in V6.

Hemant

Sathish said...

Hem ant, good article as this relates to a undo problem I have in production. I am in 10.2.0.4 ..what are the implications of turning auto tune off?

Hemant K Chitale said...

Satish,
You have to get a good grip on the profile of DML and concurrent queries. If you know that there is significant DML and queries take a long time to run -- but that these values are will within a determined range -- then you could reset UNDO_RETENTION appropriately before disabling autotune undo.

OTOH, if query runtimes are absolutely unpredictable, you might have to "live with" autotune undo -- because that is the scenario it was built for.

Remember : If you do not have DML concurrently with such queries, the whole question is moot -- because queries won't have to be reading from Undo.


Hemant

Sathish said...

Thanks Hemant, mine is a OLTP application with very aggresive response SLAs(2 seconds). So I dont have many long running queries(some cleanup jobs which run a max of 15 minutes).

Problem is auto tune is that it keeps the rollback segments for a log time(due to this bug) and extending the rollback segments on demand causes huge latencies in my application and causes a outage

So we want to turn auto_tune off with a value of undo_retention to say 1 hr(max). I am just concerned about the growth of undo datafiles with this setting

Sathish said...

Hemant,
Thanks for the feedback. My application is a OLTP app with aggresive SLAs(2 seconds with 1500 tps). Very few queries run more than few seconds.

Leaving the current behaviour causes unpredictable undo retention(up to 96 hrs) causing the application to extend undo segments on demand causing huge latencies in the application. So we have decided to turn auotune off and set retention to 1 hr

I will post the results after the test..we also found another work around in meta link

turn on autoextensibility of the undo tablespace datafiles and set the MAXSIZE to the actual size of the all the datafiles of the undo tablespace, or set _smu_debug_mode=33554432

see BUG:9681444

Hemant K Chitale said...

Satish,
If you really don't have long running queries (e.g. the max is 15 minutes), auto_tune would *NOT* be extending the tuned_undoretention.

Have you monitored V$UNDOSTAT ? (see my example) TUNED_UNDORETENTION goes up only if MAXQUERYLEN increases.

Hemant

Damir Vadas said...

>The transactions ran upto 05:53.
how did you get this...please explain.

Hemant K Chitale said...

Damir,
Because I killed the session running the transactions loop at 05:53