08 October, 2011

The impact of ASSM on Clustering of data

Automatic Segment Space Management ("ASSM") is designed to reduce block-level contention by multiple sessions concurrently attempting INSERTs by using a bitmap to identify blocks that are on the FreeList. The FreeList is accessed by sessions attempting to insert a row in a table when they need to identify "candidate" blocks. In Manual Segment Space Management ("MSSM"), it is the block-level storage parameter PCTUSED that is a determinant for when a block "enters" and "exits" the freelist. (Note : PCTFREE plays the same role in in MSSM and ASSM -- determining the "reserved" free space in a block that may be used for UPDATEs to rows in the block.

Here is demonstration of the impact of ASSM on the clustering of data. I use a Unique Index based on a Sequence to ensure that every new row has an incremented value.



SQL>
SQL> -- demo behaviour of ASSM with multiple sessions inserting concurrently
SQL> -- data gets "distributed" to reduce block-contention
SQL> -- but this affects the Clustering Factor !
SQL>
SQL>

I start with two tablespaces : ASSM and MSSM and the target tables and indexes :

SQL> drop tablespace ASSM_TBS including contents and datafiles;
drop tablespace ASSM_TBS including contents and datafiles
*
ERROR at line 1:
ORA-00959: tablespace 'ASSM_TBS' does not exist


SQL> create tablespace ASSM_TBS extent management local autoallocate segment space management auto;

Tablespace created.

SQL> drop tablespace MSSM_TBS including contents and datafiles;
drop tablespace MSSM_TBS including contents and datafiles
*
ERROR at line 1:
ORA-00959: tablespace 'MSSM_TBS' does not exist


SQL> create tablespace MSSM_TBS extent management local autoallocate segment space management manual;

Tablespace created.

SQL>
SQL>

SQL> -- grants required for stored procedures to access v$process, v$session, v$mystat
SQL> grant select on v_$process to hemant;

Grant succeeded.

SQL> grant select on v_$session to hemant;

Grant succeeded.

SQL> grant select on v_$mystat to hemant;

Grant succeeded.

SQL>
SQL>
SQL> -- Reconnect as HEMANT
SQL> connect hemant/hemant
Connected.
SQL>

SQL> -- setup the Invoices table and PK index and sequence
SQL> drop table ASSM_Table;
drop table ASSM_Table
*
ERROR at line 1:
ORA-00942: table or view does not exist


SQL> create table ASSM_Table (
2 invoice_id number not null,
3 invoice_date date,
4 invoice_details varchar2(50),
5 inserting_pid number)
6 tablespace ASSM_TBS
7 /

Table created.

SQL> create unique index ASSM_Table_PK on ASSM_Table(invoice_id) tablespace ASSM_TBS;

Index created.

SQL> alter table ASSM_Table add constraint ASSM_Table_PK primary key (invoice_id);

Table altered.

SQL> drop sequence ASSM_invoice_Seq ;
drop sequence ASSM_invoice_Seq
*
ERROR at line 1:
ORA-02289: sequence does not exist


SQL> create sequence ASSM_invoice_Seq start with 1 increment by 1;

Sequence created.

SQL>
SQL>
SQL> drop table MSSM_Table;
drop table MSSM_Table
*
ERROR at line 1:
ORA-00942: table or view does not exist


SQL> create table MSSM_Table (
2 invoice_id number not null,
3 invoice_date date,
4 invoice_details varchar2(50),
5 inserting_pid number)
6 tablespace MSSM_TBS
7 /

Table created.

SQL> create unique index MSSM_Table_PK on MSSM_Table(invoice_id) tablespace MSSM_TBS;

Index created.

SQL> alter table MSSM_Table add constraint MSSM_Table_PK primary key (invoice_id);

Table altered.

SQL> drop sequence MSSM_invoice_Seq ;
drop sequence MSSM_invoice_Seq
*
ERROR at line 1:
ORA-02289: sequence does not exist


SQL> create sequence MSSM_invoice_Seq start with 1 increment by 1;

Sequence created.

SQL>
SQL>

I then create two procedures that I will use to simulate an application server / ETL engine that inserts 50,000 rows but uses the Sequence to generate Primary Key.

SQL> -- create procedures that will simulate an application server
SQL> -- or ETL engine that inserts 50000 rows
SQL> create or replace procedure populate_ASSM_Table
2 as
3 i number;
4 begin
5 for i in 1..50001
6 loop
7 insert into ASSM_Table
8 select ASSM_invoice_seq.nextval, sysdate+rownum/10000, rpad('A',40,'X'), pid
9 from
10 v$process p, v$session s
11 where p.addr=s.paddr
12 and s.sid=(select sid from v$mystat where rownum=1)
13 ;
14 end loop;
15 commit;
16 end;
17 /

Procedure created.

SQL>

SQL> create or replace procedure populate_MSSM_Table
2 as
3 i number;
4 begin
5 for i in 1..50001
6 loop
7 insert into MSSM_Table
8 select MSSM_invoice_seq.nextval, sysdate+rownum/10000, rpad('A',40,'X'), pid
9 from
10 v$process p, v$session s
11 where p.addr=s.paddr
12 and s.sid=(select sid from v$mystat where rownum=1)
13 ;
14 end loop;
15 commit;
16 end;
17 /

Procedure created.

SQL>

I then wait for the procedures to be executed by 3 sessions to simulate 3 concurrent sessions inserting 50,000 rows each :

SQL>
SQL> REM now run the procedures from three separate sessions
SQL> pause .............. wait for the procedures to be executed
.............. wait for the procedures to be executed

SQL>

Note : The two procedures are executed from 3 sessions (i.e. the three sessions first execute Populate_ASSM_Table, followed by an execution of Populate_MSSM_Table). The invocations of the procedures are almost concurrent as I execute them from SQL sessions in 3 terminal windows.

Once the executions are completed, I gather and review statistics for the two tables : the one in the ASSM Tablespace and the other in the MSSM Tablespace.
SQL> -- gather statistics on the tables and indexes
SQL> exec dbms_Stats.gather_table_stats('','ASSM_TABLE',estimate_percent=>100,cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL> exec dbms_Stats.gather_table_stats('','MSSM_TABLE',estimate_percent=>100,cascade=>TRUE);

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> -- view statistics on the tables and indexes
SQL> select table_name, num_rows, blocks
2 from user_tables
3 where table_name in ('ASSM_TABLE','MSSM_TABLE')
4 order by 1
5 /

TABLE_NAME NUM_ROWS BLOCKS
------------------------------ ---------- ----------
ASSM_TABLE 150003 1378
MSSM_TABLE 150003 1294

SQL>
SQL> select index_name, num_rows, blevel, leaf_blocks, clustering_factor
2 from user_indexes
3 where index_name in ('ASSM_TABLE_PK','MSSM_TABLE_PK')
4 order by 1
5 /

INDEX_NAME NUM_ROWS BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ---------- ---------- ----------- -----------------
ASSM_TABLE_PK 150003 1 341 5729
MSSM_TABLE_PK 150003 1 334 1588

SQL>

Since I have run GATHER_TABLE_STATS, cascade=>TRUE, the CLUSTERING_FACTOR truly represents the distribution of the rows. Thus, after reading the row for INVOICE_ID 'n', the likelihood of INVOICE_ID 'n+1' being in the same block is much better in the MSSM_Table than in the ASSM_Table.

Now, here's a quiz question. Note the count of distinct blocks for each inserting Process. Why is the count of distinct blocks actually much higher for the MSSM Table ?
SQL> -- view the number of distinct blocks
SQL> select inserting_pid,count(distinct(dbms_rowid.rowid_block_number(rowid)))
2 from ASSM_TABLE
3 group by inserting_pid
4 order by 1
5 /

INSERTING_PID COUNT(DISTINCT(DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)))
------------- -----------------------------------------------------
28 622
29 711
30 666

SQL>
SQL> select inserting_pid,count(distinct(dbms_rowid.rowid_block_number(rowid)))
2 from MSSM_TABLE
3 group by inserting_pid
4 order by 1
5 /

INSERTING_PID COUNT(DISTINCT(DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)))
------------- -----------------------------------------------------
28 1269
29 1290
30 1273

SQL>


(Note : Each tablespace is only 1 datafile, so we need not consider FILE_ID when evaluating the BLOCK_NUMBERs).

.
.
UPDATE : See the test repeated with the distribution of INVOICE_IDs across multiple blocks (i.e. poorly clustered) in ASSM on 20-Oct.

.
.
.

2 comments:

Narendra said...

When multiple sessions insert data in a table created in ASSM tablespace, each session will be served (blocks) by a different freelist (in order to improve concurrency, which is the objective of ASSM). This will result in less number of blocks sharing data from more than one process.
In case of table created in MSSM tablespace, there exists only a single freelist (by default as none is specified in CREATE TABLE), which needs to serve 3 processes inserting data. This results in more number of blocks sharing data from more than one process (but affecting the concurrency).
Hope this makes sense :)

Hemant K Chitale said...

Narendra,
Unfortunately it isn't so easy to discern. Because of the speed at which each of the inserting processes read the sequence, true interleaving of values didn't occur.

In MSSM, as you rightly point out, a freelist is used. So new inserts will go into the same block until it is "full" (100% minus pctfree). However, in ASSM, when the table is empty, blocks are allocated in "groups" and each inserting process was given it's own blocks.
So, inserting_pid inserted rows into blocks 131, 134, 135, 132 ... apparently "jumping around"
Had the table being partially loaded with inserts and deletes before this operation, the picture would have been different if some pre-existing (partially loaded) blocks were "visible".