30 September, 2018

Partitioning -- 6 : Hash Partitioning

Unlike Range or List Partitioning where you define the rule which identifies which Partition a row will be inserted into (based on the value in the Partition Key Column(s)),  Hash Partitioning relies on Oracle applying a "hashing formula (algorithm)" to "randomly" distribute incoming rows across the available Partitions in the table.
This would be useful when you want to break up a table into smaller physical segments (maybe into even separate Tablespaces on different disks) without considering grouping of data.  In Date based Range Partitioning data is grouped into different Partitions -- i.e. physical segments on disk --  based on the Date value (e.g. by Month or Year).  In List Partitioning, data is grouped based on the value in the Partition Key Column.

Here is a small example on Hash Partitioning :

SQL> create table iot_incoming_data
  2  (data_item_number number,
  3   data_item_key varchar2(32),
  4   data_item_value varchar2(64),
  5   data_item_timestamp timestamp)
  6  partition by hash (data_item_number)
  7  (partition p1 tablespace hash_ptn_1,
  8   partition p2 tablespace hash_ptn_2,
  9   partition p3 tablespace hash_ptn_3,
 10   partition p4 tablespace hash_ptn_4)
 11  /

Table created.

SQL> 


In this definition of the table, I have "randomly" distributed incoming rows across 4 Partitions in 4 different Tablespaces.  Given the incoming "data_item_number" values (either machine generated or from a sequence), each of the 4 Partitions would be equally loaded.
(In contrast, in Date based Range Partitioning of, say, a SALES table, you might have fewer rows in older Partitions and an increasing number of rows in new Partitions as your business and Sales Volume grow over time !).

Unlike Range Partitioning, Hash Partitioning will not perform well for a "range based query"  (e.g. a range of sales dates or a range of data item numbers).  It is suitable for "equality" or "in-list" predicates.  If you do need a range based query, you would need a Global Index.

Note that it is advised that you should use a Power of 2 for the number of Hash Partitions.

Let me demonstrate insertion of data :

SQL> insert into iot_incoming_data
  2  select rownum,                               
  3  dbms_random.string('X',16),
  4  dbms_random.string('X',32),
  5  systimestamp
  6  from dual
  7  connect by level < 10001;

10000 rows created.

SQL> commit;

Commit complete.

SQL> 
SQL> exec dbms_stats.gather_table_stats('HEMANT','IOT_INCOMING_DATA');

PL/SQL procedure successfully completed.

SQL> select partition_name, num_rows
  2  from user_tab_partitions
  3  where table_name = 'IOT_INCOMING_DATA'
  4  order by partition_position
  5  /

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
P1                                   2471
P2                                   2527
P3                                   2521
P4                                   2481

SQL> 


Note that I have inserted the 10,000 rows from a single session.  In the real world, you would have multiple sessions concurrently inserting rows into the table.
Based on the Hashing algorithm that Oracle used (note : this is internal to Oracle and we cannot use any custom algorithm), Oracle has more or less evenly distributed the incoming rows across the 4 Partitions.

Let me select some random rows from the Partitions :

SQL> select data_item_number  
  2  from iot_incoming_data partition (P1)
  3  where rownum < 6 
  4  order by 1;

DATA_ITEM_NUMBER
----------------
            8361
            8362
            8369
            8379
            8380

SQL> 
SQL> select data_item_number
  2  from iot_incoming_data partition (P2)
  3  where rownum < 6
  4  order by 1
  5  /

DATA_ITEM_NUMBER
----------------
            8087
            8099
            8101
            8105
            8109

SQL> 
SQL> select data_item_number
  2  from iot_incoming_data partition (P3)
  3  where rownum < 6
  4  and data_item_number < 100
  5  order by 1
  6  /

DATA_ITEM_NUMBER
----------------
               2
               5
               8
              18
              20

SQL> 
SQL> select data_item_number
  2  from iot_incoming_data partition (P4)
  3  where rownum < 6
  4  and data_item_number between 1000 and 1100
  5  order by 1
  6  /

DATA_ITEM_NUMBER
----------------
            1001
            1002
            1005
            1008
            1009

SQL> 


(The first two queries returned rows with values greater than 8000 simply because I didn't specify a range of values as a filter and those rows came from the first few blocks that Oracle read from the buffer cache).
Note how the DATA_ITEM_NUMBER values indicate "near-random" distribution of rows across the Partitions.  It is likely that if I had created multiple sessions concurrently running inserts into the table, distribution of the rows would have been even more "random".



16 September, 2018

Partitioning -- 5 : List Partitioning

List Partitioning allows you to specify a value (or a set of values) for the Partition Key to map to each Partition.

This example shows List Partitioning.

SQL> create table request_queue
  2  (request_id  number primary key,
  3   request_submision_time timestamp,
  4   requestor  number,
  5   request_arg_1 varchar2(255),
  6   request_arg_2 varchar2(255),
  7   request_arg_3 varchar2(255),
  8   request_status varchar2(10),
  9   request_completion_time timestamp)
 10  partition by list (request_status)
 11  (partition p_submitted values ('SUBMITTED'),
 12   partition p_running values ('RUNNING'),
 13   partition p_errored values ('ERRORED'),
 14   partition p_completed values ('COMPLETED'),
 15   partition p_miscell values ('RECHECK','FLAGGED','UNKNOWN'),
 16   partition p_default values (DEFAULT)
 17  )
 18  /

Table created.

SQL> 


Note how the P_MISCELL Partition can host multiple values for the REQUEST_STATUS column.
The last Partition, has is specified as a DEFAULT Partition (note that DEFAULT is a keyword, not a value like the others) to hold rows for REQUEST_STATUS for values not mapped to any of the other Partitions.  With List Partitioning, you should always have a DEFAULT Partition (it can have any name, e.g. P_UNKNOWN) so that unmapped rows can be captured.

If you go back to my previous post on Row Movement, you should realise the danger of capturing changing values (e.g. from "SUBMITTED" to "RUNNING" to "COMPLETED") in different Partitions.  What is the impact of updating a Request from the "SUBMITTED" status to the "RUNNING" status and then to the "COMPLETED" status ?  It is not simply an update of the REQUEST_STATUS column alone but a physical reinsertion of the entire row (with the consequent update to all indexes) at each change of status.

SQL> insert into request_queue
  2  values (request_id_seq.nextval,systimestamp,101,   
  3  'FAC1','NOTE',null,'SUBMITTED',null)
  4  /

1 row created.

SQL>
SQL> commit;

Commit complete.

.... sometime later ....

SQL> update request_queue
  2  set request_status = 'RUNNING'
  3  where request_id=1001
  4  /
update request_queue
       *
ERROR at line 1:
ORA-14402: updating partition key column would cause a partition change


SQL> 


So, although now we know that we must ENABLE ROW MOVEMENT, we must suffer the impact of the physical reinsertion of the entire row into a new Partition.

SQL> alter table request_queue enable row movement;

Table altered.

SQL> update request_queue
  2  set request_status = 'RUNNING'
  3  where request_id=1001
  4  /

1 row updated.

SQL> commit;

Commit complete.

SQL> 
.... sometime later ....

SQL> update request_queue
  2  set request_status = 'COMPLETED',
  3  request_completion_time=systimestamp
  4  where request_id=1001
  5  /

1 row updated.

SQL> commit;

Commit complete.

SQL> 


(Note that all the previous "Partitioning 3a to 3d" posts about Indexing apply to List Partitioning as well)



09 September, 2018

Partitioning -- 4 : Row Movement

Do you expect Primary Keys to be updatable ?  Some argue that Primary Key values should be immutable.  The argument is that a Primary Key should not be modified.

What about Partition Keys ?  Would you allow a Partition Key to be updated ?

Let me take the SALES_DATA table again :

SQL> desc sales_data
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 SALE_ID                                   NOT NULL NUMBER
 SALE_DATE                                          DATE
 INVOICE_NUMBER                                     VARCHAR2(21)
 CUSTOMER_ID                                        NUMBER
 PRODUCT_ID                                         NUMBER
 SALE_VALUE                                         NUMBER

SQL> insert into sales_data
  2  values (sales_data_seq.nextval, 
  3          to_date('09-SEP-2019','DD-MON-YYYY'),
  4          'INV320001X',
  5          45,
  6          52,
  7          10000)
  8  /

1 row created.

SQL> commit;

Commit complete.

SQL> 


After the INSERT, I realise that the year in the SALE_DATE is wrong -- it is 2019 instead of 2018.  I need to update the row to set the year to 2018.
(Since the SALES_DATA table is partitioned to have a separate Partition for each year, this row has gone into the P_2019 Partition).

SQL> select * from sales_data
  2  where invoice_number='INV320001X' and customer_id=45;

   SALE_ID SALE_DATE INVOICE_NUMBER        CUSTOMER_ID PRODUCT_ID SALE_VALUE
---------- --------- --------------------- ----------- ---------- ----------
    320001 09-SEP-19 INV320001X                     45         52      10000

SQL> select * from sales_data partition (P_2019);

   SALE_ID SALE_DATE INVOICE_NUMBER        CUSTOMER_ID PRODUCT_ID SALE_VALUE
---------- --------- --------------------- ----------- ---------- ----------
    320001 09-SEP-19 INV320001X                     45         52      10000

SQL> 
SQL> update sales_data
  2  set sale_date = to_date('09-SEP-2018','DD-MON-YYYY')
  3  where sale_id=320001
  4  /
update sales_data
       *
ERROR at line 1:
ORA-14402: updating partition key column would cause a partition change


SQL> 


I encounter an error.  Oracle does not like updating a Partition Key value such that the row would have to move to a different Partition --- from the P_2019 Partition to the P_2018 Partition.

How would I allow updates that result in a row moving to a different Partition ?

SQL> alter table sales_data enable row movement;

Table altered.

SQL> update sales_data
  2  set sale_date = to_date('09-SEP-2018','DD-MON-YYYY')
  3  where sale_id=320001
  4  /

1 row updated.

SQL> commit;

Commit complete.

SQL> select * from sales_data partition (P_2019);

no rows selected

SQL> select * from sales_data partition (P_2018)
  2  where sale_id=320001
  3  /

   SALE_ID SALE_DATE INVOICE_NUMBER        CUSTOMER_ID PRODUCT_ID SALE_VALUE
---------- --------- --------------------- ----------- ---------- ----------
    320001 09-SEP-18 INV320001X                     45         52      10000

SQL> 


The ALTER TABLE ... ENABLE ROW MOVEMENT is a DDL command (needs to be issued only once to allow any number of subsequent updates to the tables rows) that allows a row to move from one Partition to another Partition.  In this case, the row moved from P_2019 to P_2018.

Moving rows from one Partition to another Partition is expensive.  Each row moved in such a manner results in
(a) marking deletion of the row from the original Partition
(b) physically inserting the *entire* rows (irrespective of length of the row) into the new Partition -- not just the SALE_DATE value but every column has to be written into a block in the new Partition
(c) updating *every* index (Global or Local) on the Table

Edit  14-Sep-18:
Also see two earlier posts :
"Enable Row Movement"
and
"Enable Row Movement with MSSM"

That is why it is not a good design to have frequently updated Partition Keys resulting in a row moving from one Partition to another.  You may have to reconsider the Partitioning definition or data and transaction flow in the application.

(Do you know where else ENABLE ROW MOVEMENT is required ?  There are other cases, not related to Partitioning, where you may have to ENABLE ROW MOVEMENT for a table.  By default when you CREATE a Table, ROW MOVEMENT is not enabled unless you explicitly enable it).



08 September, 2018

Partitioning -- 3d : Partial Indexing (in 11g)

Oracle 12c has introduced a new feature called "Partial Index" whereby selective partitions of a Table are indexed.  This is useful, for example, where you have a large historical table and you know that older Partitions are infrequently accessed and no longer need to be indexed.  For such tables, you can afford to "lose" the index for these older Partitions.

How would you do this in 11.2 ?

Let me go back to the SALES_DATA table with data from 2016 to 2018 populated.  This is the status of the index partition segments :

SQL> l
  1  select segment_name, partition_name, bytes/1024
  2  from user_segments
  3  where segment_type = 'INDEX PARTITION'
  4  and segment_name in
  5  (select index_name
  6   from user_indexes
  7   where table_name = 'SALES_DATA')
  8* order by 1,2
SQL> /

SEGMENT_NAME                   PARTITION_NA BYTES/1024
------------------------------ ------------ ----------
SALES_DATA_LCL_NDX_1           P_2016             3072
SALES_DATA_LCL_NDX_1           P_2017             3072
SALES_DATA_LCL_NDX_1           P_2018             3072
SALES_DATA_LCL_NDX_2           P_2016               64
SALES_DATA_LCL_NDX_2           P_2017               64
SALES_DATA_LCL_NDX_2           P_2018               64

6 rows selected.

SQL> 


So, if I now want to "unindex" the year 2016 partition (P_2016) of the SALES_DATA table, I can :

SQL> show parameter deferred_segment_creation

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
deferred_segment_creation            boolean     TRUE

SQL> alter index SALES_DATA_LCL_NDX_1 modify partition P_2016 unusable;

Index altered.

SQL> alter index SALES_DATA_LCL_NDX_2 modify partition P_2016 unusable;

Index altered.

SQL> 
SQL> l
  1  select segment_name, partition_name, bytes/1024
  2  from user_segments
  3  where segment_type = 'INDEX PARTITION'
  4  and segment_name in
  5  (select index_name
  6   from user_indexes
  7   where table_name = 'SALES_DATA')
  8* order by 1,2
SQL> /

SEGMENT_NAME                   PARTITION_NA BYTES/1024
------------------------------ ------------ ----------
SALES_DATA_LCL_NDX_1           P_2017             3072
SALES_DATA_LCL_NDX_1           P_2018             3072
SALES_DATA_LCL_NDX_2           P_2017               64
SALES_DATA_LCL_NDX_2           P_2018               64

SQL> 
SQL> select  count(*) from sales_data partition (P_2016);

  COUNT(*)
----------
    100000

SQL>               



You will notice that although the P_2016 Partition in the Table has data, the corresponding Index Partition no longer has a segment -- no space is allocated to it  (although the logical definition of the index exists).  This is possible with the "deferred_segment_creation" parameter set to TRUE in 11g.

In fact, you will notice that although the table has Partitions for 2019 and 2020 and MAXVALUE, corresponding Index Partition Segments do not exist (because no data has been inserted into those Table Partitions yet) !

SQL> select partition_name           
  2  from user_tab_partitions
  3  where table_name = 'SALES_DATA'
  4  order by partition_position
  5  /

PARTITION_NA
------------
P_2016
P_2017
P_2018
P_2019
P_2020
P_MAXVALUE

6 rows selected.

SQL> 
SQL> select index_name, partition_name, status
  2  from user_ind_partitions
  3  where index_name in
  4  (select index_name
  5   from user_indexes
  6   where table_name = 'SALES_DATA')
  7  order by index_name, partition_position
  8  /

INDEX_NAME                     PARTITION_NA STATUS
------------------------------ ------------ --------
SALES_DATA_LCL_NDX_1           P_2016       UNUSABLE
SALES_DATA_LCL_NDX_1           P_2017       USABLE
SALES_DATA_LCL_NDX_1           P_2018       USABLE
SALES_DATA_LCL_NDX_1           P_2019       USABLE
SALES_DATA_LCL_NDX_1           P_2020       USABLE
SALES_DATA_LCL_NDX_1           P_MAXVALUE   USABLE
SALES_DATA_LCL_NDX_2           P_2016       UNUSABLE
SALES_DATA_LCL_NDX_2           P_2017       USABLE
SALES_DATA_LCL_NDX_2           P_2018       USABLE
SALES_DATA_LCL_NDX_2           P_2019       USABLE
SALES_DATA_LCL_NDX_2           P_2020       USABLE
SALES_DATA_LCL_NDX_2           P_MAXVALUE   USABLE

12 rows selected.

SQL> 


This behaviour is a consequence of "deferred_segment_creation".

Note : If a Partitioned Index is a Unique / Primary Key Index, do NOT attempt to set an Index Partition to UNUSABLE.  UNUSABLE status would prevent INSERTs into the table.