04 October, 2008

Delayed Block Cleanout

A query that reads datablocks that have been updated by a large DML (INSERT/UPDATE/DELETE) may well incur "delayed block cleanout". The commit operation of the DML does not involve clearing the ITL entries in each modified datablock but only marking the transaction in the undo slot entry as "COMMITTED" (plus, possibly, a few blocks being updated). Therefore, the next query (whether from the same database session OR from another database session) to read the datablock has to verify the ITL entries against the transaction in the undo segment and then "clean out" the ITL entry and the row marker in each datablock. This process of updating the datablock also generates redo.
The presence of the "Penalty" of delayed block cleanout is visible in the "cleanout %" statistics and also in the "consistent gets - examination" count [which is a subset of the "consistent gets" count].

This has implications in that large batch jobs that update very many datablocks and are able to exeute a COMMIT quickly actually cause subsequent jobs or queries against the same table to suffer this overhead.

You may not see "delayed block cleanout" when only a small set of datablocks are updated by a transaction.


This set of SQL commands and outputs below show how the *first* query against the table undergoing DML suffers this Penalty and the "cleanout %" and "consistent gets - examination" and "redo size" statistics are incremented (ie incurred) for this query but not the next query on the same datablocks.

I use with a table with 3.24million rows and run UPDATE, DELETE and INSERT operations against the table and identify the changes to these statistics which indicate delayed block cleanout.





SQL> select blocks,num_rows from user_tables where table_name = upper('test_d_b_c');

BLOCKS NUM_ROWS
---------- ----------
47197 3242752

SQL> REM Get the number of consistent gets -- the query has been run once ahead to parse it
SQL> SELECT COUNT(*) FROM TEST_D_B_C;

COUNT(*)
----------
3242752

SQL> @Report_select_Statistics
SQL> set echo off

Statistic Value
---------------------------------------------------------- ----------------
cleanout - number of ktugct calls 0
cleanouts and rollbacks - consistent read gets 0
cleanouts only - consistent read gets 0
consistent changes 0
consistent gets 48,174
consistent gets - examination 498
db block changes 4
db block gets 3
redo entries 2
redo size 672
undo change vector size 204
SQL>
SQL> REM REM ########## Test for an UPDATE ################ REM REM ####################
SQL> REM Run an update on 60% (not all) of the rows
SQL> UPDATE TEST_D_B_C SET COL_1 = 'HKC_'||SUBSTR(COL_1,1,15) WHERE MOD(COL_4,10) > 3 ;

1945344 rows updated.

SQL> COMMIT;

Commit complete.

SQL> @Report_select_Statistics
SQL> set echo off

Statistic Value
---------------------------------------------------------- ----------------
cleanout - number of ktugct calls 0
cleanouts and rollbacks - consistent read gets 0
cleanouts only - consistent read gets 0
consistent changes 0
consistent gets 95,395
consistent gets - examination 503
db block changes 1,471,008
db block gets 732,521
redo entries 726,323
redo size 340,420,260
undo change vector size 141,809,380
SQL>
SQL>
SQL> REM Now let's query the table
SQL> REM The first execution will suffer delayed block cleanout
SQL>
SQL> SELECT COUNT(*) FROM TEST_D_B_C;

COUNT(*)
----------
3242752

SQL> @Report_select_Statistics
SQL> set echo off

Statistic Value
---------------------------------------------------------- ----------------
cleanout - number of ktugct calls 46,969
cleanouts and rollbacks - consistent read gets 0
cleanouts only - consistent read gets 46,969
consistent changes 0
consistent gets 189,575
consistent gets - examination 47,472
db block changes 1,517,977
db block gets 732,521
redo entries 773,293
redo size 343,803,772
undo change vector size 141,809,380
SQL>
SQL> SELECT COUNT(*) FROM TEST_D_B_C;

COUNT(*)
----------
3242752

SQL> @Report_select_Statistics
SQL> set echo off

Statistic Value
---------------------------------------------------------- ----------------
cleanout - number of ktugct calls 46,969
cleanouts and rollbacks - consistent read gets 0
cleanouts only - consistent read gets 46,969
consistent changes 0
consistent gets 236,786
consistent gets - examination 47,472
db block changes 1,517,977
db block gets 732,521
redo entries 773,293
redo size 343,803,772
undo change vector size 141,809,380
SQL>
SQL>
SQL> REM REM ########## Test for a DELETE ################ REM REM ####################
SQL> REM Test for a DELETE
SQL> DELETE TEST_D_B_C WHERE MOD(COL_4,5) > 3;

648640 rows deleted.

SQL> COMMIT;

Commit complete.

SQL> @Report_select_Statistics
SQL> set echo off

Statistic Value
---------------------------------------------------------- ----------------
cleanout - number of ktugct calls 46,969
cleanouts and rollbacks - consistent read gets 0
cleanouts only - consistent read gets 46,969
consistent changes 0
consistent gets 284,451
consistent gets - examination 47,486
db block changes 2,870,813
db block gets 1,465,003
redo entries 1,458,221
redo size 586,670,260
undo change vector size 292,754,836
SQL>
SQL> REM Now let's query the table
SQL> REM The first execution will suffer delayed block cleanout
SQL>
SQL> SELECT COUNT(*) FROM TEST_D_B_C;

COUNT(*)
----------
2594112

SQL> @Report_select_Statistics
SQL> set echo off

Statistic Value
---------------------------------------------------------- ----------------
cleanout - number of ktugct calls 56,180
cleanouts and rollbacks - consistent read gets 0
cleanouts only - consistent read gets 56,180
consistent changes 0
consistent gets 340,873
consistent gets - examination 56,697
db block changes 2,880,024
db block gets 1,465,003
redo entries 1,467,432
redo size 587,334,024
undo change vector size 292,754,836
SQL>
SQL> SELECT COUNT(*) FROM TEST_D_B_C;

COUNT(*)
----------
2594112

SQL> @Report_select_Statistics
SQL> set echo off

Statistic Value
---------------------------------------------------------- ----------------
cleanout - number of ktugct calls 56,180
cleanouts and rollbacks - consistent read gets 0
cleanouts only - consistent read gets 56,180
consistent changes 0
consistent gets 388,084
consistent gets - examination 56,697
db block changes 2,880,024
db block gets 1,465,003
redo entries 1,467,432
redo size 587,334,024
undo change vector size 292,754,836
SQL>
SQL>
SQL> REM REM ########## Test for a INSERT ################ REM REM ####################
SQL> REM Test for an INSERT
SQL> INSERT INTO TEST_D_B_C SELECT * FROM TEST_D_B_C_BAK WHERE MOD(COL_4,5) > 3;

648640 rows created.

SQL> COMMIT;

Commit complete.

SQL> @Report_select_Statistics
SQL> set echo off

Statistic Value
---------------------------------------------------------- ----------------
cleanout - number of ktugct calls 65,255
cleanouts and rollbacks - consistent read gets 0
cleanouts only - consistent read gets 56,180
consistent changes 0
consistent gets 453,419
consistent gets - examination 65,810
db block changes 2,936,433
db block gets 1,532,712
redo entries 1,506,633
redo size 659,399,228
undo change vector size 295,378,592
SQL>
SQL> REM Now let's query the table
SQL> REM The first execution will suffer delayed block cleanout
SQL>
SQL> SELECT COUNT(*) FROM TEST_D_B_C;

COUNT(*)
----------
3242752

SQL> @Report_select_Statistics
SQL> set echo off

Statistic Value
---------------------------------------------------------- ----------------
cleanout - number of ktugct calls 72,839
cleanouts and rollbacks - consistent read gets 0
cleanouts only - consistent read gets 63,764
consistent changes 0
consistent gets 508,214
consistent gets - examination 73,394
db block changes 2,944,017
db block gets 1,532,712
redo entries 1,514,217
redo size 659,945,320
undo change vector size 295,378,592
SQL>
SQL> SELECT COUNT(*) FROM TEST_D_B_C;

COUNT(*)
----------
3242752

SQL> @Report_select_Statistics
SQL> set echo off

Statistic Value
---------------------------------------------------------- ----------------
cleanout - number of ktugct calls 72,839
cleanouts and rollbacks - consistent read gets 0
cleanouts only - consistent read gets 63,764
consistent changes 0
consistent gets 555,425
consistent gets - examination 73,394
db block changes 2,944,017
db block gets 1,532,712
redo entries 1,514,217
redo size 659,945,320
undo change vector size 295,378,592
SQL>


As you can see "cleanout - number of ktugct calls" and "cleanouts only - consistent read gets" are incremented as with "consistent gets - examination". [Remember that "consistent gets - examination" count is *included* in "consistent gets" count]. Also, "redo entries" and "redo size" are incremented. These occur for the first query against the modified datablocks.
NOTE : This table [TEST_D_B_C] does NOT have any indexes so "consistent gets - examination" does NOT reflect reads on index blocks [root blocks or otherwise]. {google might show a few references indicating the "consistent gets - examination" occurs for index blocks but that isn't the only case for this statistic}.


The above test was on 10.2.0.1 I have had similar results on 10.2.0.2

However, on 10.2.0.4 I have had a very surprisingly large number of "consistent gets - examination" {exceeding 1.9million} for the UPDATE itself [testing with both ASSM and MSSM], which I cannot explain yet. The "consistent gets - examination" count for the "delayed block cleanout" effect on the first query is still consistent. However, I haven't published the 10.2.0.4 numbers because of the 1.9million "consistent gets - examination" in the UPDATE. That is something I have to investigate.

35 comments:

Anonymous said...

Hi Hemant,
May be I'm wrong here.
When you do LARGE DML obviously the NEXT query i.e. the FIRST query after DML will incur overhead of DBC.
so what you've demonstrated here is anything diffrent?

also

after 1st DML (UPDATE)
1st query
cleanout - number of ktugct calls 46,969
cleanouts only - consistent read gets 46,969

2nd query
cleanout - number of ktugct calls 46,969
cleanouts only - consistent read gets 46,969

2 DML (DELETE)
1st query
cleanout - number of ktugct calls 56,180
cleanouts only - consistent read gets 56,180

2nd query
cleanout - number of ktugct calls 56,180
cleanouts only - consistent read gets 56,180

3 DML (INSERT)
1st query
cleanout - number of ktugct calls 72,839
cleanouts only - consistent read gets 63,764

2nd query
cleanout - number of ktugct calls 72,839
cleanouts only - consistent read gets 63,764

also Here statistcs are same for both queries for each DML. so both queris are incurring same overhaed.
so you said
"below show how the *first* query against the table undergoing DML suffers this Penalty and the "cleanout %" and "consistent gets - examination" and "redo size" statistics are incremented (ie incurred) for this query but not the next query on the same datablocks."

whereas above demonstration shows FIRST as well as SECOND query suffers.

Regards!
Nitin

Hemant K Chitale said...

Statistics (whether in V$SYSSTAT or V$SESSTAT -- where I took them from) are always cumulative.
Thus, the "ktugct" calls for the first SELECT was 46,969-0 == 46,969.
The "ktugct" calls for the second SELECT as 46,969-46,969 == 0.
and so on.

So, compute the *increment* in statistics at each execution to determine the "cost" in that execution.

Scofield said...

Mr Hemant;

Thanks for your contributions to OTN forum.

When I issue very big insert,update,delete (more than %10 of buffer cache)
the blocks become dirty in the buffer cache.
Even,after dirty blocks written to datafiles, they dont become clean right?
I have to select all rows to make them clean.

1-)In that case, if I dont select all rows of the table; dirty blocks may be written more than once into datafiles.
which is space waste in datafiles..Is that right?

2-)Also If I never select the modified blocks of that table, the space will be wasted in buffer cache,since
dirty blocks will always be in buffer cache, never cleaned.



Can you explain this issue in detail Mr Hemant.

Many thanks

Hemant K Chitale said...

1. A "dirty block" is written back to disk only to the *same* location, identified by it's DataBlockAddress. Even if the same block is modified multiple times and happens to get written back frequently, it is to the same location. So there is no "waste of space".

2. "dirty" buffers in the db_cache will be written back to disk by DBWR when there is pressure to release space for new operations -- other SQLs / other sessions that need space allocated in the buffer cache for the blocks that need to read and/or update.
Furthermore, when DBWR does a periodic for checkpoints, dirty buffers get written to disk.

Scofield said...

1-)
so, After dirty blocks written to datafiles,do they become clean immediately or wait for commit or select to clean?

2-) What if I performed a very big update and never performed
select * from big_table.
In that case dirty blocks will always be in buffer cache?

3-)
I have read ur example which is
(Delayed Block Cleanout -- through Instance Restart)
After database reboot all of the sga components should be cleared, how come dirty blocks still in buffer cache?

Hemant K Chitale said...

Dirty buffers will get written by DBWR gradually even if they are not written by the commit immediately.

You keep confusing dirty blocks with buffer cache.
Buffers in the buffer cache become dirty when they are modified. These get written back to the database blocks by DBWR at any deferred time.
When an instance dies or is shutdown, no buffer cache exists anymore. However, blocks would have been written by DBWR to disk (if the shutdown was normal/immediate then all modified blocks would have been written).

Block Cleanout is not just about dirty buffers. It is about transaction cleanout. Go back and read Jonathan Lewis's posting if you don't understand mine.

scofield said...

Hi Hemant
I have already read his blog, I also read a book in this issue,but I still have doubts.
Could you please explain this in detail in your own words?
I need your enlightining explnation.
You helpmed me a lot before

Hemant K Chitale said...

I have explained it. So has Jonathan Lewis.

Marcio Nobre said...

Hi there

When I issue ddl
For example :create table

Is it first written to cache or disk?
What is the sequential order?

Hemant K Chitale said...

Marcio,
All block updates (whether to normal user table / index blocks OR to data dictionary blocks) are done in the buffer cache.
Since the data dictionary is also maintained in the row cache (in-memory in the shared pool), data dictionary updates also are reflected there immediately.
Updates from the buffer cache to disk are deferred to be written by DBWR -- Oracle would not differentiate whether they are data dictionary (SYSTEM) or user tables / indexes updates.

Marcio Nobre said...

Thanks sir,
I have one last question about backup.

Assume the size of my datafile is 50gb.but the used part is only 30gb.The rest is (20gb) free.

If I take a backup with rman.
Does it backup all 50g or only used part?

Hemant K Chitale said...

Your latest question is not relevant here.

Scofield said...

Mr Hemant;

How does oracle understand if the block is committed or not?
Does it only look for the ITL entry in the block or Transaction table in rollback segment ?

Hemant K Chitale said...

Scofield,

Transactions lock at the Row Level.
Blocks are not locked, Rows are locked.
There is no such thing as "block is committed". A Transaction commits updates to Rows. These are tracked by ITL entries -- each row with active DML (or not having been cleaned out) has an ITL entry.

Scofield said...

Mr Hemant;
So Oracle only looks ITL entries to determine if there are any locks.
No need to go disk to check transaction table which is in rollback segments.
Is that right?

What I read is,transaction table also stores the status of the transaction,(committed or not)
I am not sure if oracle checks it for every transaction..

Hemant K Chitale said...

If there is an ITL entry for a row it means that either
a. a transaction is still active
or
b. a transaction has completed and committed but a cleanout has not occurred.

If a session needs to generate a read consistent image of the row, it needs to go to the undo segment identified by the ITL entry. If a session needs to cleanout the block it needs to go to the undo segment's transaction table to confirm if the transaction has committed or not.
If a row is locked and not committed, the next DML has to wait on the lock.

Scofield said...

Mr Hemant;
Does it mean that,
Commit always modify the status of the transaction in transaction table.

Why oracle doesnt store the status of the transaction(committed or not) inside the blocks? Why need to go to transaction table?

Hemant K Chitale said...

The ITL is the Interested Transaction List for rows in the block.
It is NOT the "transaction table".

Scofield said...

Mr Hemant;
Assume I have dirty blocks, and checkpoint occured, and they are written to datafiles.
Now all blocks are clean in the buffer cache.

After I issue commit, Does oracle go to disk and bring the block to buffer cache and clean its itl entry in the buffer cache?
or cleans the itl enries directly in disk without retrieving them to cache?

Hemant K Chitale said...

All updates (except for Direct Path INSERTs, and LOADs) are done through the Buffer Cache.
So, yes, the block has to be loaded into the buffer cache first.
(Else, how will Oracle know that the ITL entry needs updating, without reading it -- the process of reading the disk block requires that it be loaded into the buffer cache first !)

Scofield said...

Mr Hemant;
The thing is, In the above scenerio, commit retrives blocks from disk to cache inorder to clean itl entries.
(it is not select, so oracle doesnt read)

Hemant K Chitale said...

I'm sorry to say, but in my opinion you haven't started applying your mind to it.
How will Oracle update a block (an ITL cleanout is an update to a block) *without* reading it from disk ?
You think that only SELECTs cause reads but updates can be done on disk ? THINK THINK THINK and then THINK again.

Scofield said...

Mr Hemant;
I get my answer somewhere else, and now I am clear.
"The commit cleanout will only occur in cached blocks. If the block isn't there anymore, we won't go to disk clean it at that time"

Pascal said...

Hemant;

Why does oracle store scn in block leve and not in row level?

Say ,my transaction started at T1..

Someone updated a row after T1, now ITL entries of the block is open.
Then a commit issued and itl entries are cleaned and block scn in incremented.

Other rows in this block hasnt changed, but the since the block scn is incremented, does oracle have to read the before image of the unchanged rows?
I hope Im clear

Hemant K Chitale said...

Pascal,

SCNs are also part of the ITL entries.

Thus, if a block, containing, say 10 rows, has 3 rows "active" in different transactions, the three different ITL entries will each have an SCN corresponding to the respective transaction while the block level SCN will be the highest one.
Multiple clones of the same block can be maintained by Oracle to allow different readers to see the 3 rows as at different points in time.


Hemant K Chitale

Pascal said...

Hemant;

Why does oracle always need the read before image of the block instead of reading only modified rows?

Say my transaction started at T1.

Some of the rows are modified in block1 at T2.

My transaction has to read the before image of the entire block..


However not all the rows are changed in this block...Why this is necessary?

Hemant K Chitale said...

Generating a read consistent image of rows requires reading the undo to recreate the image.
Your transaction has to recreate an image of the rows as they existed.


However, data block operations are in terms of (guess what ? ...) blocks. How will Oracle organise data when in the buffer cache ? As buffers.

Pinal Patel said...

Hi Hemant...

Thanks for your time !!!!

Below notes are from Expert oracle architecture by TOMAS KYTE...

-->1)An SCN is generated for our transaction. SCN is also used to guarantee read-consistency and checkpointing
in the database. Think of the SCN as a ticker; every time someone COMMITs, the SCN is incremented by one.
-->2) LGWR writes all of our remaining buffered redo log entries to disk and records the SCN in
the online redo log files as well. This step is actually the COMMIT. If this step occurs, we
have committed. Our transaction entry is “removed” from V$TRANSACTION—this shows
that we have committed.
-->3) All locks recorded in V$LOCK held by our session are released, and everyone who was
enqueued waiting on locks we held will be woken up and allowed to proceed with their
work.
--4) Some of the blocks our transaction modified will be visited and “cleaned out” in a
fast mode if they are still in the buffer cache. Block cleanout refers to the lock-related
information we store in the database block header. Basically, we are cleaning out our
transaction information on the block, so the next person who visits the block won’t
have to. We are doing this in a way that need not generate redo log information, saving
considerable work later (this is discussed more fully in the upcoming “Block Cleanout”
section).
-----------------------------------
My douts at point 1 and point 4

Point 1:- it means that every commit generated new incremental SCN in redo log file and after sends message "commit completed" to end-users. After above this process...DOSES IT IMMEDIATE FLUSHES DIRTY BUFFERS TO DATAFILE USING...or wait for default ---threshold set for CHECKPOINT PROCESSING...

point 4:- Regarding Blocking cleaning ---It means that after every commit it forces Database writer ---to remaining write dirty buffers(i.e i used the word remaining that is beause it happends that some of the dirty blocks already written to disk)...so here...what i understand that after every commit....checkpoint occurs...using fast commit or block clean out mechanism...

Please clarify my douts on the same...

Regards
Pinal Patel

Pinal Patel said...

Hi Hemant,

thanks for your time !!!

My Douts regarding block cleanout after commit--i.e fast commit procedure after commit happens in online redo logfile ...

Does it means that after every commit ---DBWR writes remaining dirty buffers(let's say...0.1% dirty buffers) from DB_CACHE to disk..using block cleanout(FAST COMMIT) mechanism..i.e it signals checkpointing using block cleanout mechanism...

so i understand from this is that after every commit, checkpoint occurs...

Please clarify my douts and thanks in advance for your time.

Regards
Pinal Patel

Hemant K Chitale said...

Pinal,

Point 1 : The COMMIT may cause some dirty blocks related to that transaction to be flushed. That doesn't mean that every COMMIT flushes all the dirty blocks relating to each transaction.

Point 2 : It is a Fast Commit mechanism. The COMMIT's job is not to guarantee that every modified block is written to the datafiles. It's job is to guarantee that the redo is written to the redo log file.

Hemant K Chitale

Pratik said...

@Pinal,

Does it means that after every commit ---DBWR writes remaining dirty buffers(let's say...0.1% dirty buffers) from DB_CACHE to disk


No, after every commit LGWR write dirty buffer in redo log files(On disk, so now your committed data are crash proof.)

Checkpoint is event, when it occurs, The DBWR writes dirty buffer from DBBC to datafile. It is possible that DNWR writes uncommitted data to data file

Writes by LGWR, Writes by DBWR are performed continuously regardless of Commit operation.

Anonymous said...

I just read another post on delayed block cleanout concepts.I would like to know what happens when the same block is updated by two different transactions in sequence. What will be stored in data block header in this case?

Hemant K Chitale said...

Anonymous,
When that article says "a reference is stored in the header of the data block ..." the author is talking of the ITL -- the Interested Transaction List.
Each transaction that has one or more rows locked in the block has an entry.
Therefore, the two transactions you mention would have two separate entries, with different SCNs.

Obviously, the most "current" copy of the block is that which is constructed with the highest committed SCN. (If one of the two transactions has not yet committed, it could still rollback it's row update, so Oracle has to be able to reconstruct the image of the block with reference to this transaction as well). The "cleanout", if necessary, by another query session, will occur with respect to the committed transaction. The ITL entry for the committed transaction is updated.
The ITL entry for the uncomitted transaction cannot be cleared -- so as to allow a rollback if the session initiating the transaction issues a rollback and/or errors out and Oracle has to execute a rollback of its transaction.

Anonymous said...

Hi Hemant,

I would like to know if there is any difference in response time while inserting 10 rows vs 1 Million rows in Oracle ? My understanding is that there is no real difference in response time during the insert, but the delay comes when a process tries to select the data, keeping delayed block cleanout in mind(clearing ITL entries)

Can you pls reconfirm if this is a correct assumption.
Thanks

Hemant K Chitale said...

INSERTING more rows requires updating more table and index blocks. This takes more time.

Some of the target (e.g. those with free spacce or leaf blocks) may not be in the buffer cache and have to be read from disk. This also adds to the time.

Delayed Block Cleanout may or may not be greater.