21 March, 2007

Backups and Recoveries, SANs and Clones, and Murphy

Here's an excellent writeup on Backups and Recoveries, SANs and Clones, Murphy and s*t hitting the f*
Even if you are not a DBA do read the article in it's entirety.
(article entited "The Only Issue" dizwell's blog.)
UPDATE : The blog post is no longer available.

a) Change Control by DBAs (use scripts and logs, not command-line and how I hate GUI point-and-click !)
b) Test your backup and recovery
c) Don't "assume"
d) Murphy will be Murphy ! -- really, truly everything that can go wrong does go wrong -- read the whole article !

20 March, 2007

Optimizer Index Cost Parameters

Frequently, I come across postings rejoicing about the benefits of OPTIMIZER_INDEX_COST_ADJ and OPTIMIZER_INDEX_CACHING. However, in a recent response, Tom Kyte seems to be either a) Unimpressed b) Wanting to "put down" these parameters. Rather than making it clear that a performance benefit MAY arise IF the execution plan actually changes because of "tuning" these parameters, he just seems to say "ho hum .. the optimizer will perceive a difference in COST". He keeps repeating that COST would differ (and, I agree, that is why COST isn't always a reliable indicator of expected response time). But he forgets to mention that Performance also _may_ differ. However, he _does_ quote from his book "..I've seen systems go from nonfunctional to blazingly fast simply by adjusting these two knobs." which point isn't emphasised through the rest of his posting.
Am I saying that we should always try to tune these parameters ? No. Although I have used these parameter a few times with noticeable improvement in query response time, they don't always provide "faster" performance. Just to say, that when all other tuning is failing, these parameters are worth considering.

14 March, 2007

Understanding "Timed Events" in a StatsPack Report

Jonathan Lewis writes about the miscalculations [misinterpretations] you might make when reading the "Top 5 Timed Events" [or "Top 5 Wait Events"] section of a StatsPack report in this article.

Database Error Exposed on the Internet

How NOT to expose a Database error on the Internet. I just responded to an email/online survey and when I submitted my responses, I was presented with this screen (site name / info "erased" out) :

12 March, 2007

Throughput v Scalability

How do you measure "performance" of an Oracle Database [or "Database Server"] ? When asked to make a presentation on "performance" do you present CPU utilisation figures obtained from the OS ? Do you present I/O waits obtained from "sar" or "iostat" ?
Oracle's Wait Interface does allow us to provide a "breakdown" of Total Response Time into CPU, I/O waits and -- importantly -- Contention : Latches / Locks (Enqueues) etc. You could have a system where a single program has very high throughput -- screaming.
Add multiple copies of the same code running concurrently, multiple users accesing the same data and you run into Contention. You have to change focus from Throughput to Scalability. You have to convince users / management how Scalability can be an issue even when "only queries" are running. Latch management means that Read Only operations (known by users as "Reports") can , at times, "not Scale".

06 March, 2007

Using Normal Tables for Temporary Data

Many application vendors still use regular (normal) tables to hold Temporary (Transient) Data.

Recently, I came across code like this being executed by a particular section of the application which allows the user to run his own queries :

Since the data is "temporary" (is DELETEd and INSERTed every time the table is accessed), the vendor did not think an Index was necessary ---- after all, the table was supposed to be [very ?] small.
There being no Index on TEMP_TABLE, the DELETE and the SELECT had to do Full-Table-Scans of TEMP_TABLE. Unfortunately, at some time in the recent past, TEMP_TABLE had grown to 49,000 blocks -- because the particular "key" values that the user queries by can retrieve 10s to thousands of rows-- ie the INSERT operation for some user in the past had inserted hundreds of thousands of rows. Although this part of the application was "easy to use" every user executing it thereafter suffered two full-table-scans of TEMP_TABLE very frequently.
The solution comprised of two steps :
a) Index TEMP_TABLE on TID
b) TRUNCATE TEMP_TABLE to reset the High Water Mark
I couldn't get permission for the Truncate in a live environment, so I just created the Index alone.
That made a significant difference for the user ! Still waiting for the TRUNCATE to help further.

UPDATE : 21-Sep : Again, last week, I was able to make a significant difference to a report. I found that the report (query running for 1 hour) was repeatedly scanning a "temp" table which was expected to and did contain 0 rows. Unfortunately, the table had grown to a couple of hundred MBs. A quick TRUNCATE of the table, resetting the High Water Mark and the query time was down to less than 10 seconds.