30 September, 2018

Partitioning -- 6 : Hash Partitioning

Unlike Range or List Partitioning where you define the rule which identifies which Partition a row will be inserted into (based on the value in the Partition Key Column(s)),  Hash Partitioning relies on Oracle applying a "hashing formula (algorithm)" to "randomly" distribute incoming rows across the available Partitions in the table.
This would be useful when you want to break up a table into smaller physical segments (maybe into even separate Tablespaces on different disks) without considering grouping of data.  In Date based Range Partitioning data is grouped into different Partitions -- i.e. physical segments on disk --  based on the Date value (e.g. by Month or Year).  In List Partitioning, data is grouped based on the value in the Partition Key Column.

Here is a small example on Hash Partitioning :

SQL> create table iot_incoming_data
  2  (data_item_number number,
  3   data_item_key varchar2(32),
  4   data_item_value varchar2(64),
  5   data_item_timestamp timestamp)
  6  partition by hash (data_item_number)
  7  (partition p1 tablespace hash_ptn_1,
  8   partition p2 tablespace hash_ptn_2,
  9   partition p3 tablespace hash_ptn_3,
 10   partition p4 tablespace hash_ptn_4)
 11  /

Table created.

SQL> 


In this definition of the table, I have "randomly" distributed incoming rows across 4 Partitions in 4 different Tablespaces.  Given the incoming "data_item_number" values (either machine generated or from a sequence), each of the 4 Partitions would be equally loaded.
(In contrast, in Date based Range Partitioning of, say, a SALES table, you might have fewer rows in older Partitions and an increasing number of rows in new Partitions as your business and Sales Volume grow over time !).

Unlike Range Partitioning, Hash Partitioning will not perform well for a "range based query"  (e.g. a range of sales dates or a range of data item numbers).  It is suitable for "equality" or "in-list" predicates.  If you do need a range based query, you would need a Global Index.

Note that it is advised that you should use a Power of 2 for the number of Hash Partitions.

Let me demonstrate insertion of data :

SQL> insert into iot_incoming_data
  2  select rownum,                               
  3  dbms_random.string('X',16),
  4  dbms_random.string('X',32),
  5  systimestamp
  6  from dual
  7  connect by level < 10001;

10000 rows created.

SQL> commit;

Commit complete.

SQL> 
SQL> exec dbms_stats.gather_table_stats('HEMANT','IOT_INCOMING_DATA');

PL/SQL procedure successfully completed.

SQL> select partition_name, num_rows
  2  from user_tab_partitions
  3  where table_name = 'IOT_INCOMING_DATA'
  4  order by partition_position
  5  /

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
P1                                   2471
P2                                   2527
P3                                   2521
P4                                   2481

SQL> 


Note that I have inserted the 10,000 rows from a single session.  In the real world, you would have multiple sessions concurrently inserting rows into the table.
Based on the Hashing algorithm that Oracle used (note : this is internal to Oracle and we cannot use any custom algorithm), Oracle has more or less evenly distributed the incoming rows across the 4 Partitions.

Let me select some random rows from the Partitions :

SQL> select data_item_number  
  2  from iot_incoming_data partition (P1)
  3  where rownum < 6 
  4  order by 1;

DATA_ITEM_NUMBER
----------------
            8361
            8362
            8369
            8379
            8380

SQL> 
SQL> select data_item_number
  2  from iot_incoming_data partition (P2)
  3  where rownum < 6
  4  order by 1
  5  /

DATA_ITEM_NUMBER
----------------
            8087
            8099
            8101
            8105
            8109

SQL> 
SQL> select data_item_number
  2  from iot_incoming_data partition (P3)
  3  where rownum < 6
  4  and data_item_number < 100
  5  order by 1
  6  /

DATA_ITEM_NUMBER
----------------
               2
               5
               8
              18
              20

SQL> 
SQL> select data_item_number
  2  from iot_incoming_data partition (P4)
  3  where rownum < 6
  4  and data_item_number between 1000 and 1100
  5  order by 1
  6  /

DATA_ITEM_NUMBER
----------------
            1001
            1002
            1005
            1008
            1009

SQL> 


(The first two queries returned rows with values greater than 8000 simply because I didn't specify a range of values as a filter and those rows came from the first few blocks that Oracle read from the buffer cache).
Note how the DATA_ITEM_NUMBER values indicate "near-random" distribution of rows across the Partitions.  It is likely that if I had created multiple sessions concurrently running inserts into the table, distribution of the rows would have been even more "random".



No comments: