30 January, 2007

Sequences : Should they be Gap Free ?

Frequently through your career as a DBA you are either asked "Why is there a gap in my Number series ?" OR are _told_ "the business-requirement/user/auditor CANNOT accept missing numbers". Then you have to keep your cool as you try to explain how Sequences work in Oracle. Sequences are designed to generate Unique Numbers, by default they do _NOT_ guarantee consecutive (ie "Gap Free") numbers. Some programmers try to beat sequences by using their own number generators, based on a single row in a table or the max() of a particular column in a table and causing either contention or repeated physical/FTS reads.
I was just reading Tom Kyte's responses at http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:4343369880986 -- "There is a fact about sequences -- an UNDENIABLE, UNESCAPABLE fact -- they are not gap free, will never be gap free, they will have gaps!".

28 January, 2007

Using Partial Recoveries to test Backups

When testing new backup technologies on "fairly large" databases [see my previous post http://hemantoracledba.blogspot.com/2007/01/deleting-data-doesnt-reduce-size-of.html for a definition of "fairly large"] sometimes you can contrive to do Partial Recoveries to test Restoration and Recovery of Backups. This has become necessary, for instance, in one case where the vendors' [multiple vendors involved] Snap* based backup method while seemingly restoring the whole database wouldn't be seen as consistent datafiles by Oracle whenever I issued RECOVER commands. While the SnapShot and SnapRestore wasn't taking very long, when I wanted to "prove" the Oracle BEGIN/END BACKUP scripts and the RECOVER scripts that I was using, I had to use "cp" as a Restoration method. This allowed me to prove that when I used "cp" to backup and restore the datafiles, there were no errors in the RECOVER command. Unfortunately, the database being "fairly large", I would have been spending far many more hours in the "cp" then in the actual BEGIN/END BACKUP and RECOVER commands !. So I simulated transactions and did a "cp" based restoration of only the SYSTEM and UNDO tablespaces. That way I could "copy" the "whole" database much faster.
Essentially what I did was
a) BEGIN BACKUP for all tablespaces
b) copy out SYSTEM and UNDOTBS
c) END BACKUP
d) Restore SYSTEM and UNDOTBS to the alternate location
e) CREATE CONTROLFILE with only SYSTEM and UNDOTBS
f) RECOVER only SYSTEM and UNDOTBS

So I was using a TSPITR method to test backup and restoration.

I could prove that when the vendors technologies were being used, I was getting inconsistent database files [database files seemed to be of a point-in-time _before_ the BEGIN BACKUP -- ie, the Snap* mechanism was copying the files __before__ the BEGIN BACKUP had been run]. This WAS resolved by the vendors finally.

Deleting data doesn't reduce the size of the backup

You sometimes may have a fairly large ["fairy large" is relative to a) the average size of database in your organisation b) the disk storage and I/O capacity it is running on c) the backup throughput and tape sizes] database that is taking many hours to backup. You are doing full image backups. You or your users may be tempted to delete data to "reduce the size of the backup" and/or to "reduce the time it takes to backup" and/or to "reduce the duration that datafiles are in backup mode".
Unfortunately, merely deleting rows [even 10% or 99%] in table[s] doesn't really reduce the size or duration of most image backups. The delete operation only adds to the total amount of undo and a very large increase in redo and archivelog generation for which you may have to scramble for disk space.
An image backup [whether by "cp" or "cpio" or "tar"] of the Datafiles will still see the Datafiles as having the same size because the delete doesn't shrink the datafiles.
An RMAN backup will still have to backup as many datablocks as before the delete. This is true _even_ if you are successful in deleting rows in such a manner that some or many blocks are 100% empty. Once a datablock has got formatted it will always be backed-up by RMAN, whether there are rows or no rows in it subsequently.

13 January, 2007

Large (Growing) Snapshot Logs indicate that you have a problem

Some time ago, we had noticed that in a particular database, a Materialized View's refreshs were taking longer and were occassionally failing on ORA-1555s. We then redefined the MV to extract a smaller subset of data . We knewthat this was supposed to be faster. Yet, it wasn't any faster and the load that the refresh was causing was still high. We realised that the snapshot logs on the base tables hadg rown inordinately large {as a result of the ORA-1555s in the refresh's}. We dropped and recreated the snapshot logs. Thereafter, MV refresh's were also much faster.
When you know that the Query in the MV should be fast and that Fast Refreshs are
getting slow, you should look at the size of the Snapshot Log on the base tables.
It could also be that you had, at some time, defined additional MVs on the same base tables but had stopped refreshing the other MVs. Unfortunately, this would have resulted in the Snapshot Logs continuing to grow because they are still "waiting" for those other MVs to "come over and pick up the modified rows". See Note #1031924.6 on MetaLink.
Note that even if you do drop the non-refreshed MV as advised in the Note, it would still be preferable, if possible, to drop the Snapshot Logs on the base tables -- because you would want to rebuild the Snapshot Log afresh as a small log -- and rebuild other MVs that are still in use. (I say "if possible" because a complete rebuild of some MVs might also take too long to be acceptable "on-line").

08 January, 2007

Sometimes you trip up on Triggers

Me and a colleague once spent a couple of days on a simple delete statement. A
Pro*C application seemed to be running slow and the developer had identified this
portion :
EXEC SQL
delete
FROM
WHERE LOTID = :sqlLotId;

We looked at table statistics. We looked at the approximate number of rows per LOTID.
We looked at the index on LOTID. We used AUTOTRACE to see that we were encountering 8 consistent gets for 25 rows returned when running a SELECT for a single LOTID.

I started wondering about waits (enqueues, buffer busy, gc cache etc).
Then, and only then, did I run a 10046 trace.
Till then, I had figured "one table, one column driver, an EXPLAIN PLAN OR AUTOTRACE should suffice".

The 10046 trace immediately showed me where the problem began -- a "BEFORE EACH ROW" DELETE Trigger which was copying the rows into a History table and was firing for each of the 25 rows being deleted for a LOTID. And then when I looked at the History table, I found an "AFTER STATEMENT" INSERT Trigger which was, wonder of wonders, doing a Full Table Scan of the History table because it needed to delete some row from the History table !

Every Delete on the initial table would fire it's Trigger 25 times. Each execution of
that Trigger would, on inserting into the History table, cause a Full Table Scan of
the History table to satisfy the Trigger there !

The application team looked at the two triggers and decided that they did not
need the After Insert Trigger on the History table -- the delete that it ran didn't make sense. {It was probably inserted during code testing / data validation and wasn't removed soon enough !}


Problem Solved ! Nothing to do with Statistics or with rewriting the original SQL
but to do with "what happens down the line".

05 January, 2007

Views with ORDER BY

It is actually bad practice to include an ORDER BY in a View. The ORDER BY clause in the View definition will prevent you from selecting individual columns from the View.

When fetching from the View, your SQL query can be written to add the ORDER BY.
Always SELECT .... FROM ORDER BY .... thus, ORDER BY **after** selecting from the view.

My recommendation : never include an ORDER BY in a View defintion.

Building Materialized Views and Indexes

There have been occassions when I have replaced a query on a remote database
with a local materialized view, thus eliminating the frequent fetches across the
network -- only the MV refresh has to go across the network.
Furthermore, it also makes sense to index the MV local. A Materialized View
is, actually (physically) a Table. It can be indexed.
Response time on queries has gone down from 10s of minutes to single minutes
with MVs and then single seconds with the appropriate index(es) on the MVs.