13 November, 2018

Partitioning -- 9 : System Partitioning

System Partitioning, introduced in 11g, unlike all the traditional Partitioning methods, requires that all DML specify the Target Partition.  For a System Partitioned Table, the RDBMS does not use a "high value" rule to determine the Target Partition but leaves it to (actually requires) the application code (user) to specify the Partition.

In my opinion, this seems like the precursor to Oracle Database Sharding.

SQL> create table sys_part_table
  2  (id_column number,
  3  data_element_1 varchar2(50),
  4  data_element_2 varchar2(50),
  5  entry_date date)
  6  partition by SYSTEM
  7  (partition PART_A tablespace PART_TBS_A,
  8   partition PART_B tablespace PART_TBS_B,
  9   partition PART_C tablespace PART_TBS_C)
 10  /

Table created.

SQL> 


Notice that I did not specify a Partition Key (column).  The Partitions are not mapped to specific values / range of values in a Key column.

Any DML must specify the Target Partition.

SQL> insert into sys_part_table
  2  values (1, 'First Row','A New Beginning',sysdate)
  3  /
insert into sys_part_table
            *
ERROR at line 1:
ORA-14701: partition-extended name or bind variable must be used for DMLs on
tables partitioned by the System method


SQL> 
SQL> !oerr ora 14701
14701, 00000, "partition-extended name or bind variable must be used for DMLs on tables partitioned by the System method"
// *Cause:  User attempted not to use partition-extended syntax
//          for a table partitioned by the System method
// *Action: Must use of partition-extended syntax in contexts mentioned above.

SQL> 
SQL> insert into sys_part_table partition (PART_A)
  2  values (1, 'First Row','A New Beginning',sysdate)
  3  /

1 row created.

SQL> insert into sys_part_table partition (PART_B)
  2  values (2,'Second Row','And So It Continues',sysdate)
  3  /

1 row created.

SQL> 


I have to specify the Target Partition for my INSERT statement. This, obviously, also applies to DELETE and UPDATE statements.   However, I can run a SELECT statement without filtering (pruning) to any Target Partition(s) -- i.e. a SELECT statement that does not use the PARTITION clause will span across all the Partitions.

SQL> select * from sys_part_table;

 ID_COLUMN DATA_ELEMENT_1
---------- --------------------------------------------------
DATA_ELEMENT_2                                     ENTRY_DAT
-------------------------------------------------- ---------
         1 First Row
A New Beginning                                    13-NOV-18

         2 Second Row
And So It Continues                                13-NOV-18


SQL> 


With Tablespaces assigned to the Partitions (see the CREATE table statement above),  I  can have each Partition mapped to a different underlying Disk / Disk Group.
.
.
.

12 November, 2018

Partitioning -- 8 : Reference Partitioning

Like Interval Partitioning, another enhancement in 11g is Reference Partitioning.

Reference Partitioning allows you to use a Referential Integrity Constraint to equi-partition a "Child" Table with a "Parent" Table.

Here is a quick demonstration :

SQL> l
  1  create table orders
  2  (order_id  number primary key,
  3   order_date date not null,
  4   customer_id number)
  5  partition by range (order_date)
  6  (partition P_2017 values less than (to_date('01-JAN-2018','DD-MON-YYYY')),
  7   partition P_2018 values less than (to_date('01-JAN-2019','DD-MON-YYYY'))
  8* )
SQL> /

Table created.

SQL> 
SQL> l
  1  create table order_lines
  2  (line_unique_id  number primary key,
  3   order_id number not null,
  4   order_line_id  number,
  5   product_id  number,
  6   product_quantity number,
  7   constraint order_lines_fk foreign key (order_id)
  8    references orders(order_id)
  9  )
 10* partition by reference (order_lines_fk)
SQL> /

Table created.

SQL> 
SQL> col high_value format a28 trunc
SQL> col table_name format a16
SQL> col partition_name format a8
SQL> select table_name, partition_name, high_value
  2  from user_tab_partitions
  3  where table_name in ('ORDERS','ORDER_LINES')
  4  order by table_name, partition_position
  5  /

TABLE_NAME       PARTITIO HIGH_VALUE
---------------- -------- ----------------------------
ORDERS           P_2017   TO_DATE(' 2018-01-01 00:00:0
ORDERS           P_2018   TO_DATE(' 2019-01-01 00:00:0
ORDER_LINES      P_2017
ORDER_LINES      P_2018

SQL> 


Notice the "automatically" created Partitions for the ORDER_LINES ("Child") Table that match those for the ORDERS ("Parent") Table.

.
.
.

27 October, 2018

Partitioning - 7 : Interval Partitioning

Interval Partitioning was introduced in 11g as an enhancement to Range Partitioning, but supporting only DATE and NUMBER datatypes.  This allows you to define the interval for each Partition and leave it to the database engine to automatically create new Partitions as required when data is inserted.  Thus, you do not have to pre-create Partitions for future data.

Here is a demo with Monthly Date Intervals.

 2  (manufacture_date date,
  3   item_code varchar2(32),
  4   item_quantity number(8,0))
  5  partition by range (manufacture_date)
  6  interval (numtoyminterval(1,'MONTH'))
  7  (partition P_1 values less than (to_date('01-JUL-2018','DD-MON-YYYY')))
  8  /

Table created.

SQL> set long 32           
SQL> select partition_name, high_value
  2  from user_tab_partitions
  3  where table_name = 'MANUFACTURING_SUMMARY'
  4  /

PARTITION_NAME                 HIGH_VALUE
------------------------------ --------------------------------
P_1                            TO_DATE(' 2018-07-01 00:00:00',

SQL> 


The INTERVAL clause specifies how the upper bounds for new Partitions are to be defined.  I only need to name the boundary for the first (lowest) Partition and name the Partition.  All subsequent Partitions are automatically created with names assigned by Oracle and high values based on the INTERVAL clause.

Let me insert a few rows.

SQL> insert into manufacturing_summary
  2  (manufacture_date, item_code, item_quantity)
  3  values
  4  (to_date('29-JUN-2018','DD-MON-YYYY'), 'ABC123',4000) 
  5  /

1 row created.

SQL> insert into manufacturing_summary
  2  values (to_date('01-JUL-2018','DD-MON-YYYY'),'ABC123',3000)
  3  /

1 row created.

SQL> insert into manufacturing_summary
  2  values (to_date('01-JUL-2018','DD-MON-YYYY'),'FGH422',1000)
  3  /

1 row created.

SQL> commit;

Commit complete.

SQL> 
SQL> select partition_name, high_value
  2  from user_tab_partitions
  3  where table_name = 'MANUFACTURING_SUMMARY'
  4  order by partition_position
  5  /

PARTITION_NAME                  HIGH_VALUE
------------------------------ --------------------------------
P_1                            TO_DATE(' 2018-07-01 00:00:00',
SYS_P519                       TO_DATE(' 2018-08-01 00:00:00',

SQL> 


Oracle automatically created Partition S_P519 for July data.

What happens if there manufactuing daa is not available from 02-Jul-2018 to, say, 04-Sep-2018 ?  And availability of data resumes only on 05-Sep-2018 ?

SQL> insert into manufacturing_summary
  2  values (to_date('05-SEP-2018','DD-MON-YYYY'),'ABC123',3000) 
  3  /

1 row created.

SQL> commit;

Commit complete.

SQL> select partition_position, partition_name, high_value
  2  from user_tab_partitions
  3  where table_name = 'MANUFACTURING_SUMMARY'
  4  order by partition_position
  5  /

PARTITION_POSITION PARTITION_NAME
------------------ ------------------------------
HIGH_VALUE
--------------------------------
                 1 P_1
TO_DATE(' 2018-07-01 00:00:00',

                 2 SYS_P519
TO_DATE(' 2018-08-01 00:00:00',

                 3 SYS_P520
TO_DATE(' 2018-10-01 00:00:00',


SQL> 


The third Partition, SYS_P520 is created with the Upper Bound (HIGH_VALUE) of 01-Oct for the September data.

What if August data becomes available subsequently and is inserted ?

SQL> insert into manufacturing_summary
  2  values (to_date('10-AUG-2018','DD-MON-YYYY'),'ABC123',1500)
  3  /

1 row created.

SQL> commit;

Commit complete.

SQL> select partition_position, partition_name, high_value
  2  from user_tab_partitions
  3  where table_name = 'MANUFACTURING_SUMMARY'
  4  order by partition_position
  5  /

PARTITION_POSITION PARTITION_NAME
------------------ ------------------------------
HIGH_VALUE
--------------------------------
                 1 P_1
TO_DATE(' 2018-07-01 00:00:00',

                 2 SYS_P519
TO_DATE(' 2018-08-01 00:00:00',

                 3 SYS_P521
TO_DATE(' 2018-09-01 00:00:00',

                 4 SYS_P520
TO_DATE(' 2018-10-01 00:00:00',


SQL> 


A new Partition with the HIGH_VALUE of 01-Sept did get created as SYS_P521 and inserted into the ordered position 3.  While the previously created Partition S_P520 (HIGH_VALUE 01-Oct) got renumbered to 4.  We can verify this by actually querying the Partitions.

SQL> select * from manufacturing_summary partition (SYS_P521);

MANUFACTU ITEM_CODE                        ITEM_QUANTITY
--------- -------------------------------- -------------
10-AUG-18 ABC123                                    1500

SQL>
SQL> select * from manufacturing_summary partition (SYS_P520);

MANUFACTU ITEM_CODE                        ITEM_QUANTITY
--------- -------------------------------- -------------
05-SEP-18 ABC123                                    3000

SQL> 


SYS_P520 was created first for September data although no August data existed.  SYS_P521 was created subsequently for August data which was inserted later.

Remember this : NEVER rely on Partition Names to attempt to identify what data is in a Partition.  Always use PARTITION_POSITION and HIGH_VALUE to identify the logical position (rank) and the data that is present in the Partition.

Where do the Partition names SYS_P519, SYS_P520, SYS_P521 come from ?  They are from a system defined sequence, self-managed by Oracle.

Let me demonstrate this with another example.

SQL> l
  1  create table dummy_intvl_tbl
  2  (id_col number,
  3   data_col varchar2(15))
  4  partition by range(id_col)
  5  interval (100)
  6* (partition P_1 values less than (101))
SQL> /

Table created.

SQL> insert into dummy_intvl_tbl
  2  values (50,'data1');

1 row created.

SQL>
SQL> insert into dummy_intvl_tbl
  2  values (150,'data3');

1 row created.

SQL> 
SQL> insert into manufacturing_summary
  2  values (to_date('25-OCT-2018','DD-MON-YYYY'),'FGH422',500);

1 row created.

SQL> commit;

Commit complete.

SQL> 
SQL> select table_name, partition_position, partition_name, high_value
  2  from user_tab_partitions
  3  where table_name in ('MANUFACTURING_SUMMARY','DUMMY_INTVL_TBL')
  4  order by 1,2
  5  /

TABLE_NAME                     PARTITION_POSITION PARTITION_NAME
------------------------------ ------------------ ------------------------------
HIGH_VALUE
--------------------------------
DUMMY_INTVL_TBL                                 1 P_1
101

DUMMY_INTVL_TBL                                 2 SYS_P525
201

MANUFACTURING_SUMMARY                           1 P_1
TO_DATE(' 2018-07-01 00:00:00',

MANUFACTURING_SUMMARY                           2 SYS_P519
TO_DATE(' 2018-08-01 00:00:00',

MANUFACTURING_SUMMARY                           3 SYS_P521
TO_DATE(' 2018-09-01 00:00:00',

MANUFACTURING_SUMMARY                           4 SYS_P520
TO_DATE(' 2018-10-01 00:00:00',

MANUFACTURING_SUMMARY                           5 SYS_P526
TO_DATE(' 2018-11-01 00:00:00',


7 rows selected.

SQL> 


Note how Partition Name SYS_P525 was allocated to DUMMY_INTVL_TBL and then P_526 to MANUFACTURING_SUMMARY.
These System Defined Partition names use a *global* sequence, not tied to a specific table.

Can you rename the System Defined Partition after it has been automatically created ?

SQL> alter table manufacturing_summary
  2  rename partition SYS_P519 to Y18M07
  3  /

Table altered.

SQL> alter table manufacturing_summary
  2  rename partition SYS_P520 to Y18M09
  3  /

Table altered.

SQL> alter table manufacturing_summary
  2  rename partition SYS_P521 to Y18M08
  3  /

Table altered.

SQL> alter table manufacturing_summary
  2  rename partition SYS_P526 to Y18M10 
  3  /

Table altered.

SQL> 
SQL> select partition_name, high_value
  2  from user_tab_partitions
  3  where table_name = 'MANUFACTURING_SUMMARY'
  4  order by partition_position
  5  /

PARTITION_NAME                 HIGH_VALUE
------------------------------ --------------------------------
P_1                            TO_DATE(' 2018-07-01 00:00:00',
Y18M07                         TO_DATE(' 2018-08-01 00:00:00',
Y18M08                         TO_DATE(' 2018-09-01 00:00:00',
Y18M09                         TO_DATE(' 2018-10-01 00:00:00',
Y18M10                         TO_DATE(' 2018-11-01 00:00:00',

SQL> 


Yes, fortunately, you CAN rename the Partitions *after* they are automatically created.



30 September, 2018

Partitioning -- 6 : Hash Partitioning

Unlike Range or List Partitioning where you define the rule which identifies which Partition a row will be inserted into (based on the value in the Partition Key Column(s)),  Hash Partitioning relies on Oracle applying a "hashing formula (algorithm)" to "randomly" distribute incoming rows across the available Partitions in the table.
This would be useful when you want to break up a table into smaller physical segments (maybe into even separate Tablespaces on different disks) without considering grouping of data.  In Date based Range Partitioning data is grouped into different Partitions -- i.e. physical segments on disk --  based on the Date value (e.g. by Month or Year).  In List Partitioning, data is grouped based on the value in the Partition Key Column.

Here is a small example on Hash Partitioning :

SQL> create table iot_incoming_data
  2  (data_item_number number,
  3   data_item_key varchar2(32),
  4   data_item_value varchar2(64),
  5   data_item_timestamp timestamp)
  6  partition by hash (data_item_number)
  7  (partition p1 tablespace hash_ptn_1,
  8   partition p2 tablespace hash_ptn_2,
  9   partition p3 tablespace hash_ptn_3,
 10   partition p4 tablespace hash_ptn_4)
 11  /

Table created.

SQL> 


In this definition of the table, I have "randomly" distributed incoming rows across 4 Partitions in 4 different Tablespaces.  Given the incoming "data_item_number" values (either machine generated or from a sequence), each of the 4 Partitions would be equally loaded.
(In contrast, in Date based Range Partitioning of, say, a SALES table, you might have fewer rows in older Partitions and an increasing number of rows in new Partitions as your business and Sales Volume grow over time !).

Unlike Range Partitioning, Hash Partitioning will not perform well for a "range based query"  (e.g. a range of sales dates or a range of data item numbers).  It is suitable for "equality" or "in-list" predicates.  If you do need a range based query, you would need a Global Index.

Note that it is advised that you should use a Power of 2 for the number of Hash Partitions.

Let me demonstrate insertion of data :

SQL> insert into iot_incoming_data
  2  select rownum,                               
  3  dbms_random.string('X',16),
  4  dbms_random.string('X',32),
  5  systimestamp
  6  from dual
  7  connect by level < 10001;

10000 rows created.

SQL> commit;

Commit complete.

SQL> 
SQL> exec dbms_stats.gather_table_stats('HEMANT','IOT_INCOMING_DATA');

PL/SQL procedure successfully completed.

SQL> select partition_name, num_rows
  2  from user_tab_partitions
  3  where table_name = 'IOT_INCOMING_DATA'
  4  order by partition_position
  5  /

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
P1                                   2471
P2                                   2527
P3                                   2521
P4                                   2481

SQL> 


Note that I have inserted the 10,000 rows from a single session.  In the real world, you would have multiple sessions concurrently inserting rows into the table.
Based on the Hashing algorithm that Oracle used (note : this is internal to Oracle and we cannot use any custom algorithm), Oracle has more or less evenly distributed the incoming rows across the 4 Partitions.

Let me select some random rows from the Partitions :

SQL> select data_item_number  
  2  from iot_incoming_data partition (P1)
  3  where rownum < 6 
  4  order by 1;

DATA_ITEM_NUMBER
----------------
            8361
            8362
            8369
            8379
            8380

SQL> 
SQL> select data_item_number
  2  from iot_incoming_data partition (P2)
  3  where rownum < 6
  4  order by 1
  5  /

DATA_ITEM_NUMBER
----------------
            8087
            8099
            8101
            8105
            8109

SQL> 
SQL> select data_item_number
  2  from iot_incoming_data partition (P3)
  3  where rownum < 6
  4  and data_item_number < 100
  5  order by 1
  6  /

DATA_ITEM_NUMBER
----------------
               2
               5
               8
              18
              20

SQL> 
SQL> select data_item_number
  2  from iot_incoming_data partition (P4)
  3  where rownum < 6
  4  and data_item_number between 1000 and 1100
  5  order by 1
  6  /

DATA_ITEM_NUMBER
----------------
            1001
            1002
            1005
            1008
            1009

SQL> 


(The first two queries returned rows with values greater than 8000 simply because I didn't specify a range of values as a filter and those rows came from the first few blocks that Oracle read from the buffer cache).
Note how the DATA_ITEM_NUMBER values indicate "near-random" distribution of rows across the Partitions.  It is likely that if I had created multiple sessions concurrently running inserts into the table, distribution of the rows would have been even more "random".



Featured Post :

An "Awesome" List of Resources

Here's an "Awesome" List of Resources  https://github.com/sindresorhus/awesome . . .  

Preview Popular Posts