Hemant's Oracle DBA Blog: May 2007

31 May, 2007

AUTOALLOCATE and Undo Segments

Recently I had a query from another DBA who asked why a monitoring script had reported early in the morning "Threshold : Max_Free less than Twice the largest Extent. Potential error if the DataFile cannot AutoExtend" and listing two Undo Segments with 64MB extents in the UNDOTBS1 tablespace. Apparently, although DBA_EXTENTS did show two Undo Segments with extents of 64MB, the question was whether the nextextent will be bigger than 64m -- "it should be 64K. because the unit of the tablespace "undotbs1" is 64K. parameters "next" & "min_extents" are obsolete in LMT."

My response was : "AUTOALLOCATE means that the Next ExtentSize is automatically determined. Segments start with 64K extents then automatically switch to 1MB extents and 8MB extents and finally 64MB extents as they start growing. That is why it is important to know the size of the last (largest) extent . If the largest extent is already 64MB, then the next WILL be 64MB. At 07:12 in the morning, segments _SYSSMU8$ and _SYSSMU9$ were already using 64MB extents. Since AUM periodically drops and recreates Undo segments, they may get created with 64KB extents. As a Segment grows for large transaction, its Next Extent will autoallocate to 1MB, 8MB and 64MB extents. Do NOT confuse LMT and AutoAllocate. They are two different concepts." I was able to produce a listing showing undo segments with 64KB, 1MB and 8MB extents (by the time I ran the query, there were no 64MB extents around -- because of the "Automatic" Undo Management that Oracle does). The alert had just warned that if there had been a long running transaction which had continued to grow and the undo tablespace datafile did not autoextend, the transaction might have failed -- just as would happen with a table or index segment in a normal tablespace whose datafile does not extent to allocate the next extent for the table and index. To further test this, I just ran a few SQLs on my PC : UNDO SEGMENTS BEFORE RUNNING MY TRANSACTIONS :

SQL> select segment_name, bytes/1024, blocks , count(*)
2 from dba_extents
3 where tablespace_name = 'UNDOTBS1'
4 group by segment_name, bytes/1024, blocks
5 order by 1
6 /
SEGMENT_NAME BYTES/1024 BLOCKS COUNT(*)
--------------- ---------- ---------- ----------
_SYSSMU1$ 64 8 2
1024 128 6
_SYSSMU10$ 64 8 16
1024 128 6
_SYSSMU2$ 64 8 115
1024 128 28
_SYSSMU3$ 64 8 2
1024 128 7
_SYSSMU4$ 64 8 2
1024 128 1
_SYSSMU5$ 64 8 2
1024 128 4
_SYSSMU6$ 64 8 25
1024 128 23
_SYSSMU7$ 64 8 3
1024 128 9
_SYSSMU8$ 64 8 2
1024 128 9
_SYSSMU9$ 64 8 127
1024 128 21
20 rows selected.
SQL> spool off

RUNNING MY TRANSACTIONS (and validating that Oracle does allocate a different Undo Segment for each new transaction) :

SQL>
SQL> REM Begin Transactions here ========================================================
SQL>
SQL>
SQL> drop table hemant.test_txn_table ;
Table dropped.
SQL>
SQL> create table hemant.test_txn_table as
2 select * from dba_objects
3 union
4 select * from dba_objects
5 /
Table created.
SQL>
SQL> update hemant.test_txn_table
2 set object_id=object_id+1, owner=substr(owner,1,3)'_HKC' ;
51538 rows updated.
SQL>
SQL> select t.xidusn, r.name, t.start_time, t.used_ublk, t.used_urec
2 from v$transaction t, v$rollname r
3 where t.xidusn=r.usn
4 /
XIDUSN NAME START_TIME USED_UBLK USED_UREC
---------- ------------------------------ -------------------- ---------- ----------
6 _SYSSMU6$ 05/31/07 23:01:20 635 51538
SQL> commit;
Commit complete.
SQL>
SQL> create index hemant.test_txn_tbl_ndx on hemant.test_txn_table(owner);
Index created.
SQL>
SQL> delete hemant.test_txn_table
2 where owner = 'SYS_HKC';
24692 rows deleted.
SQL>
SQL> select t.xidusn, r.name, t.start_time, t.used_ublk, t.used_urec
2 from v$transaction t, v$rollname r
3 where t.xidusn=r.usn
4 /
XIDUSN NAME START_TIME USED_UBLK USED_UREC
---------- ------------------------------ -------------------- ---------- ----------
7 _SYSSMU7$ 05/31/07 23:01:28 771 24823
SQL> rollback;
Rollback complete.
SQL>
SQL> update hemant.test_txn_table
2 set object_id=object_id+1, owner=substr(owner,1,3)'_HKC2'
3 where owner = 'SYS_HKC';
24692 rows updated.
SQL>
SQL> select t.xidusn, r.name, t.start_time, t.used_ublk, t.used_urec
2 from v$transaction t, v$rollname r
3 where t.xidusn=r.usn
4 /
XIDUSN NAME START_TIME USED_UBLK USED_UREC
---------- ------------------------------ -------------------- ---------- ----------
9 _SYSSMU9$ 05/31/07 23:01:40 833 74076
SQL> commit;
Commit complete.
SQL>
SQL> delete hemant.test_txn_table
2 where rowid in (select rowid from hemant.test_txn_table where rownum <>
SQL> select t.xidusn, r.name, t.start_time, t.used_ublk, t.used_urec
2 from v$transaction t, v$rollname r
3 where t.xidusn=r.usn
4 /
XIDUSN NAME START_TIME USED_UBLK USED_UREC
---------- ------------------------------ -------------------- ---------- ----------
5 _SYSSMU5$ 05/31/07 23:01:50 5 248
SQL> commit;
Commit complete.
SQL>
SQL> delete hemant.test_txn_table
2 where rowid in (select rowid from hemant.test_txn_table where rownum <>
SQL> select t.xidusn, r.name, t.start_time, t.used_ublk, t.used_urec
2 from v$transaction t, v$rollname r
3 where t.xidusn=r.usn
4 /
XIDUSN NAME START_TIME USED_UBLK USED_UREC
---------- ------------------------------ -------------------- ---------- ----------
1 _SYSSMU1$ 05/31/07 23:01:51 3 126
SQL> commit;
Commit complete.
SQL>
SQL> insert into hemant.test_txn_table
2 select * from dba_objects
3 union
4 select * from dba_objects
5 union
6 select * from hemant.test_txn_table ;
102888 rows created.
SQL>
SQL> select t.xidusn, r.name, t.start_time, t.used_ublk, t.used_urec
2 from v$transaction t, v$rollname r
3 where t.xidusn=r.usn
4 /
XIDUSN NAME START_TIME USED_UBLK USED_UREC
---------- ------------------------------ -------------------- ---------- ----------
6 _SYSSMU6$ 05/31/07 23:01:51 294 5186
SQL> commit;
Commit complete.
SQL> spool off

UNDO SEGMENTS *AFTER* RUNNING THOSE TRANSACTIONS :

SQL> select segment_name, bytes/1024, blocks , count(*)
2 from dba_extents
3 where tablespace_name = 'UNDOTBS1'
4 group by segment_name, bytes/1024, blocks
5 order by 1
6 /
SEGMENT_NAME BYTES/1024 BLOCKS COUNT(*)
--------------- ---------- ---------- ----------
_SYSSMU1$ 64 8 2
1024 128 6
_SYSSMU10$ 64 8 16
1024 128 6
_SYSSMU2$ 64 8 107
1024 128 26
_SYSSMU3$ 64 8 2
1024 128 7
_SYSSMU4$ 64 8 33
1024 128 7
_SYSSMU5$ 64 8 2
1024 128 1
_SYSSMU6$ 64 8 25
1024 128 23
_SYSSMU7$ 64 8 3
1024 128 9
_SYSSMU8$ 64 8 2
1024 128 9
_SYSSMU9$ 64 8 127
1024 128 21
20 rows selected.
SQL>

Thus, we can see that different Undo Segments grew at different rates and Extent sizes vary from 64KB to 1MB in the above example but can be 64MB as well.

28 May, 2007

Where's the Problem ? Not in the Database !

Here is a post by Herod T about how the problem can sometimes be somewhere far removed from the database server. In fact, reading the comment hidden in the application code, it seems as if the problem was "known" at sometime in the past but no solution implemented then. Users just lived with the problem for a few months.

22 May, 2007

RollForward from a Cold Backup

As promised earlier here is how I do a Rollforward from a Cold Backup :

FIRST :
This is the status of the Database *BEFORE* the Cold Backup :
SQL>
SQL> col name format a45
SQL> select to_char(sysdate,'DD-MON-RR HH24:MI') from dual;
TO_CHAR(SYSDATE
---------------
22-MAY-07 22:16
SQL>
SQL> select group#, status from v$log order by status;
GROUP# STATUS
---------- ----------------
2 CURRENT
3 INACTIVE
1 INACTIVE
SQL>
SQL> select recid, name, sequence#, to_char(first_time,'DD-MON-RR HH24:MI') ,blocks*block_size
2 from v$archived_log
3 where first_time > trunc(sysdate)
4 order by first_time;
RECID NAME SEQUENCE#
---------- --------------------------------------------- ----------
TO_CHAR(FIRST_T BLOCKS*BLOCK_SIZE
--------------- -----------------
21 C:\OR10G2DB\ARCH\ARC00001_0623282965.001 1
22-MAY-07 22:09 512
22 C:\OR10G2DB\ARCH\ARC00002_0623282965.001 2
22-MAY-07 22:09 175104
SQL>
SQL> select dbid, name, checkpoint_change#, current_scn from v$database;
DBID NAME CHECKPOINT_CHANGE#
---------- --------------------------------------------- ------------------
CURRENT_SCN
-----------
138573118 OR10G2DB 812793
812933
SQL>
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
SQL> spool off
========================================================================
After the shutdown, I do a COLD BACKUP
{including "online" Redo Log files -- I'll explain why later} of the database.
Once the Cold Backup is completed, I proceed to restart the database and execute some transactions :
--------------------------------------------------------------------------------------------------------------------------
SQL>
SQL> col name format a45
SQL> select to_char(sysdate,'DD-MON-RR HH24:MI') from dual;
TO_CHAR(SYSDATE
---------------
22-MAY-07 22:22
SQL>
SQL> select group#, status from v$log order by status;
GROUP# STATUS
---------- ----------------
2 CURRENT
3 INACTIVE
1 INACTIVE
SQL>
SQL> select recid, name, sequence#, to_char(first_time,'DD-MON-RR HH24:MI') ,blocks*block_size
2 from v$archived_log
3 where first_time > trunc(sysdate)
4 order by first_time;
RECID NAME SEQUENCE#
---------- --------------------------------------------- ----------
TO_CHAR(FIRST_T BLOCKS*BLOCK_SIZE
--------------- -----------------
21 C:\OR10G2DB\ARCH\ARC00001_0623282965.001 1
22-MAY-07 22:09 512
22 C:\OR10G2DB\ARCH\ARC00002_0623282965.001 2
22-MAY-07 22:09 175104
SQL>
SQL> select dbid, name, checkpoint_change#, current_scn from v$database;
DBID NAME CHECKPOINT_CHANGE#
---------- --------------------------------------------- ------------------
CURRENT_SCN
-----------
138573118 OR10G2DB 813231
813465
SQL>
SQL> REM Begin Transactions here =================
SQL>
SQL>
SQL> drop table hemant.test_txn_table ;
Table dropped.
SQL>
SQL> create table hemant.test_txn_table as
2 select * from dba_objects
3 union
4 select * from dba_objects
5 /
Table created.
SQL>
SQL> update hemant.test_txn_table
2 set object_id=object_id+1, owner=substr(owner,1,3)'_HKC' ;
51542 rows updated.
SQL> commit;
Commit complete.
SQL>
SQL> create index hemant.test_txn_tbl_ndx on hemant.test_txn_table(owner);
Index created.
SQL>
SQL> delete hemant.test_txn_table
2 where owner = 'SYS_HKC';
24691 rows deleted.
SQL> rollback;
Rollback complete.
SQL>
SQL> update hemant.test_txn_table
2 set object_id=object_id+1, owner=substr(owner,1,3)'_HKC2'
3 where owner = 'SYS_HKC';
24691 rows updated.
SQL> commit;
Commit complete.
SQL>
SQL> create table last_txn_time (txn_time ) as select sysdate from dual;
Table created.
SQL>
SQL> alter system switch logfile;
System altered.
SQL>
SQL> select group#, status from v$log order by status;
GROUP# STATUS
---------- ----------------
2 ACTIVE
3 ACTIVE
1 CURRENT
SQL>
SQL> select recid, name, sequence#, to_char(first_time,'DD-MON-RR HH24:MI') ,blocks*block_size
2 from v$archived_log
3 where first_time > trunc(sysdate)
4 order by first_time;
RECID NAME SEQUENCE#
---------- --------------------------------------------- ----------
TO_CHAR(FIRST_T BLOCKS*BLOCK_SIZE
--------------- -----------------
21 C:\OR10G2DB\ARCH\ARC00001_0623282965.001 1
22-MAY-07 22:09 512
22 C:\OR10G2DB\ARCH\ARC00002_0623282965.001 2
22-MAY-07 22:09 175104
23 C:\OR10G2DB\ARCH\ARC00003_0623282965.001 3
22-MAY-07 22:10 8186880
RECID NAME SEQUENCE#
---------- --------------------------------------------- ----------
TO_CHAR(FIRST_T BLOCKS*BLOCK_SIZE
--------------- -----------------
24 C:\OR10G2DB\ARCH\ARC00004_0623282965.001 4
22-MAY-07 22:22 8120320
25 C:\OR10G2DB\ARCH\ARC00005_0623282965.001 5
22-MAY-07 22:22 8120320
26 C:\OR10G2DB\ARCH\ARC00006_0623282965.001 6
22-MAY-07 22:22 8120320
RECID NAME SEQUENCE#
---------- --------------------------------------------- ----------
TO_CHAR(FIRST_T BLOCKS*BLOCK_SIZE
--------------- -----------------
27 C:\OR10G2DB\ARCH\ARC00007_0623282965.001 7
22-MAY-07 22:23 8120320
28 C:\OR10G2DB\ARCH\ARC00008_0623282965.001 8
22-MAY-07 22:23 8120320
8 rows selected.
SQL>
SQL> select dbid, name, checkpoint_change#, current_scn from v$database;
DBID NAME CHECKPOINT_CHANGE#
---------- --------------------------------------------- ------------------
CURRENT_SCN
-----------
138573118 OR10G2DB 814864
815153
SQL>
SQL> spool off
============================================================================
I have deliberately created an Index and done DELETEs, INSERTs and ROLLBACKs with UPDATEs to generate noticeable Redo (ArchiveLogs)
The ArchiveLog file is at least ARC00008 (we shall see later that there were two more Archives before the shutdown).

I next restore the Cold Backup _including_ the "online" Redo Log file to verify that the Cold Backup is consistent :
-------------------------------------------------------------------------------------------------------------------------------
SQL>
SQL> col name format a45
SQL> select to_char(sysdate,'DD-MON-RR HH24:MI') from dual;
TO_CHAR(SYSDATE
---------------
22-MAY-07 22:42
SQL>
SQL> select group#, status from v$log order by status;
GROUP# STATUS
---------- ----------------
2 CURRENT
3 INACTIVE
1 INACTIVE
SQL>
SQL> select recid, name, sequence#, to_char(first_time,'DD-MON-RR HH24:MI') ,blocks*block_size
2 from v$archived_log
3 where first_time > trunc(sysdate)
4 order by first_time;
RECID NAME SEQUENCE#
---------- --------------------------------------------- ----------
TO_CHAR(FIRST_T BLOCKS*BLOCK_SIZE
--------------- -----------------
21 C:\OR10G2DB\ARCH\ARC00001_0623282965.001 1
22-MAY-07 22:09 512
22 C:\OR10G2DB\ARCH\ARC00002_0623282965.001 2
22-MAY-07 22:09 175104
SQL>
SQL> select dbid, name, checkpoint_change#, current_scn from v$database;
DBID NAME CHECKPOINT_CHANGE#
---------- --------------------------------------------- ------------------
CURRENT_SCN
-----------
138573118 OR10G2DB 813231
813379
SQL>
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
SQL> spool off
============================================================================
As you can see, the ControlFile that is restored (from the Cold Backup) is NOT aware of ArchiveLogs beyond ARC00002. Checkpoint SCNs and "Current SCN"s are always incremented at every startup and shutdown so this listing of 813231 and 813379 is slightly ahead of the first listing of 813231 and 813465 (ie the SCN got incremented after the query on v$database and again possibly incremented during the OPEN)
As I have actually incremented SCNs and OPENed the database, I cannot really use this copy. I only did an OPEN to verify that the Restore was consistent and usable.
I will now Restore the Cold Backup *again*.
------------------------------------------------------------------------------------------------------------------------------
Whether I restore or without the "online" Redo Log files now, it doesn't matter --
because I will be doing an OPEN RESETLOGS that is necessitated by the USING BACKUP CONTROLFILE.
---------------------------------------------------------------------------------------------------------------------------------
Here is how I do the RollForward : Notice how the Rollforward is from ARC00003 onwards.
The "using backup controlfile" is important !
See my posting at http://www.freelists.org/archives/oracle-l/05-2007/msg00440.html
---------------------------------------------------------------------------------------------------------------------------------
SQL> startup mount
ORACLE instance started.
Total System Global Area 83886080 bytes
Fixed Size 1247420 bytes
Variable Size 54527812 bytes
Database Buffers 25165824 bytes
Redo Buffers 2945024 bytes
Database mounted.
SQL> recover database using backup controlfile until cancel;
ORA-00279: change 813230 generated at 05/22/2007 22:16:15 needed for thread 1
ORA-00289: suggestion : C:\OR10G2DB\ARCH\ARC00003_0623282965.001
ORA-00280: change 813230 for thread 1 is in sequence #3

Specify log: {=suggested filename AUTO CANCEL}
ORA-00279: change 813860 generated at 05/22/2007 22:22:49 needed for thread 1
ORA-00289: suggestion : C:\OR10G2DB\ARCH\ARC00004_0623282965.001
ORA-00280: change 813860 for thread 1 is in sequence #4
ORA-00278: log file 'C:\OR10G2DB\ARCH\ARC00003_0623282965.001' no longer needed
for this recovery

Specify log: {=suggested filename AUTO CANCEL}
ORA-00279: change 814287 generated at 05/22/2007 22:22:50 needed for thread 1
ORA-00289: suggestion : C:\OR10G2DB\ARCH\ARC00005_0623282965.001
ORA-00280: change 814287 for thread 1 is in sequence #5
ORA-00278: log file 'C:\OR10G2DB\ARCH\ARC00004_0623282965.001' no longer needed
for this recovery

Specify log: {=suggested filename AUTO CANCEL}
ORA-00279: change 814599 generated at 05/22/2007 22:22:59 needed for thread 1
ORA-00289: suggestion : C:\OR10G2DB\ARCH\ARC00006_0623282965.001
ORA-00280: change 814599 for thread 1 is in sequence #6
ORA-00278: log file 'C:\OR10G2DB\ARCH\ARC00005_0623282965.001' no longer needed
for this recovery

Specify log: {=suggested filename AUTO CANCEL}
ORA-00279: change 814620 generated at 05/22/2007 22:23:01 needed for thread 1
ORA-00289: suggestion : C:\OR10G2DB\ARCH\ARC00007_0623282965.001
ORA-00280: change 814620 for thread 1 is in sequence #7
ORA-00278: log file 'C:\OR10G2DB\ARCH\ARC00006_0623282965.001' no longer needed
for this recovery

Specify log: {=suggested filename AUTO CANCEL}
ORA-00279: change 814656 generated at 05/22/2007 22:23:06 needed for thread 1
ORA-00289: suggestion : C:\OR10G2DB\ARCH\ARC00008_0623282965.001
ORA-00280: change 814656 for thread 1 is in sequence #8
ORA-00278: log file 'C:\OR10G2DB\ARCH\ARC00007_0623282965.001' no longer needed
for this recovery

Specify log: {=suggested filename AUTO CANCEL}
ORA-00279: change 814864 generated at 05/22/2007 22:23:07 needed for thread 1
ORA-00289: suggestion : C:\OR10G2DB\ARCH\ARC00009_0623282965.001
ORA-00280: change 814864 for thread 1 is in sequence #9
ORA-00278: log file 'C:\OR10G2DB\ARCH\ARC00008_0623282965.001' no longer needed
for this recovery

Specify log: {=suggested filename AUTO CANCEL}
ORA-00279: change 815052 generated at 05/22/2007 22:23:12 needed for thread 1
ORA-00289: suggestion : C:\OR10G2DB\ARCH\ARC00010_0623282965.001
ORA-00280: change 815052 for thread 1 is in sequence #10
ORA-00278: log file 'C:\OR10G2DB\ARCH\ARC00009_0623282965.001' no longer needed
for this recovery

Specify log: {=suggested filename AUTO CANCEL}
ORA-00279: change 815152 generated at 05/22/2007 22:23:14 needed for thread 1
ORA-00289: suggestion : C:\OR10G2DB\ARCH\ARC00011_0623282965.001
ORA-00280: change 815152 for thread 1 is in sequence #11
ORA-00278: log file 'C:\OR10G2DB\ARCH\ARC00010_0623282965.001' no longer needed
for this recovery

Specify log: {=suggested filename AUTO CANCEL}
CANCEL
Media recovery cancelled.
SQL> alter database open resetlogs;
Database altered.
SQL> select to_char(txn_time,'DD-MON-RR HH24:MI:SS') from last_txn_time;
TO_CHAR(TXN_TIME,'
------------------
22-MAY-07 22:23:12
SQL> select group#, status from v$log order by status;
GROUP# STATUS
---------- ----------------
2 CURRENT
3 UNUSED
1 UNUSED
SQL>
SQL> select recid, name, sequence#, to_char(first_time,'DD-MON-RR HH24:MI') ,blocks*block_size
2 from v$archived_log
3 where first_time > trunc(sysdate)
4 order by first_time;
RECID
----------
NAME
--------------------------------------------------------------------------------
SEQUENCE# TO_CHAR(FIRST_T BLOCKS*BLOCK_SIZE
---------- --------------- -----------------
25
C:\OR10G2DB\ARCH\ARC00001_0623282965.001
1 22-MAY-07 22:09 512
21
C:\OR10G2DB\ARCH\ARC00001_0623282965.001
1 22-MAY-07 22:09 512
RECID
----------
NAME
--------------------------------------------------------------------------------
SEQUENCE# TO_CHAR(FIRST_T BLOCKS*BLOCK_SIZE
---------- --------------- -----------------
22
C:\OR10G2DB\ARCH\ARC00002_0623282965.001
2 22-MAY-07 22:09 175104
23
C:\OR10G2DB\ARCH\ARC00002_0623282965.001
RECID
----------
NAME
--------------------------------------------------------------------------------
SEQUENCE# TO_CHAR(FIRST_T BLOCKS*BLOCK_SIZE
---------- --------------- -----------------
2 22-MAY-07 22:09 175104
24
C:\OR10G2DB\ARCH\ARC00003_0623282965.001
3 22-MAY-07 22:10 373760
SQL> select dbid, name, checkpoint_change#, current_scn from v$database;
DBID NAME CHECKPOINT_CHANGE# CURRENT_SCN
---------- --------- ------------------ -----------
138573118 OR10G2DB 815154 815445
SQL> shutdown
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> exit
============================================================================
During the RollForward I applied ArchiveLogs upto ARC00010. I then queried my data
(select to_char(txn_time,'DD-MON-RR HH24:MI:SS') from last_txn_time or
I could have queried "test_txn_table" as well) and verified that I do
have the data created by my transactions *after* the Cold Backup.
You can also see that also Log Sequence Numbers have got Reset, the SCNs
do *NOT* get Reset. 10g's "new" feature about "roll-forward through resetlogs"
is because archivelog filenames contain a Reset_ID as well -- nothing to
do with SCNs. SCNs always are incremented (SCNs like time can only go in
one direction -- although we can actually use Server OS commands to reset
Server Clocks, Oracle relies on SCNs, not timestamps, for ordering transactions).

NOTE : This method of "RollForward" recovery is also covered in Oracle MetaLink Note#271916.1

16 May, 2007

Another Recovery from Hell story

Having read Howard Rogers's story and blogged about it earlier, I also came across Connor McDonald's Hell Story.

15 May, 2007

SQL Statement Execution Times

Daniel Fink has posted a blog entry about how Execution Statistics for long running SQLs might not be visible in StatsPack reports if the SQLs end after the StatsPack Snapshots. The general advice is to NOT run StatsPack Snapshots at long intervals ("intervals of 8 hours are meaningless" we frequently hear). However, if there was a long running SQL that began before the first StatsPack Snapshot and ended after the last Snapshot, that SQL wouldn't be seen in the StatsPack Report at all. You would be left wondering "if the server was so busy [and 'sar' did show 'busy'ness] why don't I see any significant SQLs in the Report". You might well miss the SQLs accounting for most of the buffer gets, disk reads, CPU time.

14 May, 2007

"Recovery" in Cold Backup

In a recent email discussion thread on ORACLE-L, I outlined methods
of "recovery" from loss of control files(s) and online redo logs IF
the last shutdown had been a Normal Shutdown.
(eg if ControlFiles and/or RedoLog files have been deleted after the
shutdown --- such a strategy could also be used where the database has
been cloned by copying only the datafiles from cold backup (shutdown normal/immediate).

01 May, 2007

Stress Testing

An analogy of a "stress" test that should NOT be run.

Suppose your current system is a jar containing 1million beans. The mouth of the jar is small. Generally, each user expects to fetch 100 beans. The way he does it is to put his hand in and pick up one bean at a time. Thus, each user has to put his hand into the jar 100 times. If the mouth of the jar is small, only 2 or 3 users can put in their hands simultaneously. Other users have to keep waiting. It is possible that the first user is unable to rapidly put his hand into the jar 100 times because, on occassion, someone else's hand is in the jar. Therefore, some portion of his operation's time will be spent in "waiting".

Now, suppose the new system is still a jar containing 1million beans. However, now the mouth of the jar is much bigger. This means that even if each user has to put his hand in 100times, more hands can go in simultaneously. The number of users waiting to put their hands into the jar is lesser. The probability that the first user has to wait because the mouth of the jar doesn't have enough free space is lesser. He still takes the same amount of time to put in his and and remove a bean 100 times. BUT he spends *less* time waiting for the mouth of the jar to be free. However, if your "stress test" is such that the user expects to fetch 1,000 beans [instead of 100 beans], he has to put his hand into the jar 1,000 times now ! That user will say "the system is slower".

The beans in the jar is the data.
The mouth of the jar is the system capacity [throughput in terms of I/O calls because of storage bandwidth /response time , CPU calls etc that can be handled because of the number of CPUs and the speed of the CPUs]
The hand is the CPU or I/O call that the user makes.
The "single bean" is because each call will fetch a finite amount -- eg an Indexed Read will read 1 block at a time [if it has to read 1,000 datablocks, it has to make 1,000 {actually 2,000+} I/O calls].

The "stress test" should be
a) IF normally 5 users run reports, than have 10 users run reports
b) If the user fetches 100 rows, then he must still be fetching 100 rows -- if he fetches more rows than he does normally, then whatever be the system he still has to make that many more CPU or I/O calls

Hemant's Oracle DBA Blog