Partitioning Degrades Insert Performance

Oracle offers the partitioning feature to help manage and optimize today’s huge volumes of data. Partitioning lets users store data in separate physical segments but treats the table as a single object at the logical level. When filtering on the partition key, partitioning allows the database to ignore all segments except the one containing the data specifically needed by the query (a.k.a. partition pruning). This is a huge benefit when querying partitioned tables, but recent experience with a client demonstrated to me that partitioning brings its own costs as well: partitioning slows down inserts.

Toward the end I discuss how to reduce the insert penalty for partitioned tables (hint: sort your inserts by the partition key), but I begin by demonstrating the penalty itself.  The first example compares inserts into a heap (unpartitioned) table with inserts into a partitioned one. Initially I gather insert timings against a heap table and then run the same operations using a partitioned table. Following that, I run extended SQL traces (a.k.a. 10046 traces) to record the activity behind the timing differences. (Note that all of the tests below were run against a 64-bit Oracle 10.2.0.4 database on RHEL5.)

Consider the scripts below. heap.sql accepts two parameters:

  1. The number of times to insert the contents of DBA_OBJECTS into the target table
  2. Whether or not a trace should be initiated (a value of 1 initiates the trace)

heap.sql

-- define some variables
define iterations = &1;
define trace = &2

set echo on

-- clean up from any previous runs
DROP TABLE target_heap;

whenever sqlerror exit failure

-- create the insert target
CREATE TABLE target_heap
AS SELECT * FROM dba_objects
WHERE 1 = 2;

-- initiate trace if requested
BEGIN
  IF &trace = 1 THEN
    EXECUTE IMMEDIATE 'ALTER SESSION SET tracefile_identifier = ''heap''';
    dbms_support.start_trace;
  END IF;
END;
/

set timing on

-- insert dba_objects into target as many times as requested
BEGIN
  FOR i IN 1 .. &iterations LOOP
    INSERT INTO target_heap
    SELECT * FROM dba_objects;

    COMMIT;
  END LOOP;
END;
/

set timing off

exec dbms_support.stop_trace

exit

part.sql accepts similar arguments:

  1. The number of times to insert the contents of DBA_OBJECTS into the target table
  2. The number of hash partitions the table should be created with
  3. Whether or not a trace should be initiated (a value of 1 initiates the trace

part.sql

-- define some variables
define iterations = &1;
define partCount  = &2;
define trace      = &3;

set echo on

-- clean up from an previous runs
DROP TABLE target_part_&partCount;

whenever sqlerror exit failure

-- create the insert target
CREATE TABLE target_part_&partCount
PARTITION BY HASH (owner, object_name, object_id) PARTITIONS &partCount
AS SELECT * FROM dba_objects
WHERE 1 = 2;

-- initiate trace if requested
BEGIN
  IF &trace = 1 THEN
    EXECUTE IMMEDIATE 'ALTER SESSION SET tracefile_identifier = ''part_&partCount''';
    dbms_support.start_trace;
  END IF;
END;
/

set timing on

-- insert dba_objects into target as many times as requested
BEGIN
  FOR i IN 1 .. &iterations LOOP
    INSERT INTO target_part_&partCount
    SELECT * FROM dba_objects;

    COMMIT;
  END LOOP;
END;
/

set timing off

exec dbms_support.stop_trace

exit

Note that I “set timing on” exclusively for the PL/SQL blocks that do the inserts. That way we can “grep” out the elapsed time for the operation of interest. We can even do that inside a bash for-loop to get a sample of timings, as shown below. This example runs the heap.sql script ten times. For each shell iteration, the TARGET_HEAP table is dropped, recreated, and populated with the contents of DBA_OBJECTS ten times per shell iteration:

> for((i=1; i<=10; i++)); do
>   sqlplus / @heap 10 0 | grep Elapsed
> done
Elapsed: 00:00:08.49
Elapsed: 00:00:08.58
Elapsed: 00:00:08.84
Elapsed: 00:00:08.53
Elapsed: 00:00:08.45
Elapsed: 00:00:08.43
Elapsed: 00:00:08.56
Elapsed: 00:00:08.32
Elapsed: 00:00:08.31
Elapsed: 00:00:07.95

As shown, the inserts take roughly 8 seconds to complete. This does not hold true for a partitioned target however. The example below uses the same parameters as the previous heap.sql runs, except with 8 partitions:

> for((i=1; i<=10; i++)); do
>   sqlplus / @part 10 8 0 | grep Elapsed
> done
Elapsed: 00:00:32.55
Elapsed: 00:00:29.62
Elapsed: 00:00:28.60
Elapsed: 00:00:31.24
Elapsed: 00:00:33.39
Elapsed: 00:00:30.83
Elapsed: 00:00:28.17
Elapsed: 00:00:28.56
Elapsed: 00:00:30.35
Elapsed: 00:00:29.36

In this case, the time required for the same operation took roughly 3 to 4 times longer to run compared to the unpartitioned target. More partitions add more time, though this effect seems to plateau:

> for((i=1; i<=10; i++)); do
  sqlplus / @part 10 512 0 | grep Elapsed
done
Elapsed: 00:00:41.00
Elapsed: 00:00:42.05
Elapsed: 00:00:41.68
Elapsed: 00:00:41.79
Elapsed: 00:00:40.79
Elapsed: 00:00:41.25
Elapsed: 00:00:41.08
Elapsed: 00:00:41.11
Elapsed: 00:00:41.22
Elapsed: 00:00:41.14

So what’s going on here? By enabling an extended SQL trace, we can find out. Rather than iteratively executing the script from the shell, we only execute each script once, but this time setting the last argument to “1″, which enables the trace:

> sqlplus / @heap 10 1 | grep Elapsed
Elapsed: 00:00:08.57
> sqlplus / @part 10 512 1 | grep Elapsed
Elapsed: 00:00:46.39

Then we can run the resulting trace files through TKPROF. As shown below, the partitioned table inflicted many more consistent gets on the database, each of which being a read of block buffered in memory. So why did the partitioned table require so many more reads to insert the same amount of data?

heap.sql TKPROF Output:

INSERT INTO TARGET_HEAP SELECT * FROM DBA_OBJECTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute     10      9.28       9.13          0      33590      34506     1515060
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       11      9.29       9.13          0      33590      34506     1515060

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 47     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
1515060  VIEW  DBA_OBJECTS (cr=26950 pr=0 pw=0 time=9096433 us)
1515060   UNION-ALL  (cr=26950 pr=0 pw=0 time=7581338 us)
1515010    FILTER  (cr=26740 pr=0 pw=0 time=3035930 us)
1520120     HASH JOIN  (cr=4850 pr=0 pw=0 time=3046125 us)
    520      TABLE ACCESS FULL USER$ (cr=30 pr=0 pw=0 time=855 us)
1520120      TABLE ACCESS FULL OBJ$ (cr=4820 pr=0 pw=0 time=1525629 us)
  19620     TABLE ACCESS BY INDEX ROWID IND$ (cr=21890 pr=0 pw=0 time=181776 us)
  21810      INDEX UNIQUE SCAN I_IND1 (cr=20 pr=0 pw=0 time=59074 us)(object id 39)
     50    NESTED LOOPS  (cr=210 pr=0 pw=0 time=5904 us)
     50     TABLE ACCESS FULL LINK$ (cr=30 pr=0 pw=0 time=404 us)
     50     TABLE ACCESS CLUSTER USER$ (cr=180 pr=0 pw=0 time=5326 us)
     50      INDEX UNIQUE SCAN I_USER# (cr=20 pr=0 pw=0 time=288 us)(object id 11)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  log file switch completion                      3        0.03          0.04

part.sql TKPROF Output:

INSERT INTO TARGET_PART_512 SELECT * FROM DBA_OBJECTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute     10     41.47      40.61          0    1547775    1592317     1515060
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       11     41.48      40.62          0    1547775    1592317     1515060

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 47     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
1515060  VIEW  DBA_OBJECTS (cr=29776 pr=0 pw=0 time=10611293 us)
1515060   UNION-ALL  (cr=29776 pr=0 pw=0 time=9550737 us)
1515010    FILTER  (cr=26740 pr=0 pw=0 time=4550759 us)
1520120     HASH JOIN  (cr=4850 pr=0 pw=0 time=4566061 us)
    520      TABLE ACCESS FULL USER$ (cr=30 pr=0 pw=0 time=978 us)
1520120      TABLE ACCESS FULL OBJ$ (cr=4820 pr=0 pw=0 time=1525755 us)
  19620     TABLE ACCESS BY INDEX ROWID IND$ (cr=21890 pr=0 pw=0 time=194173 us)
  21810      INDEX UNIQUE SCAN I_IND1 (cr=20 pr=0 pw=0 time=64993 us)(object id 39)
     50    NESTED LOOPS  (cr=3036 pr=0 pw=0 time=211682 us)
     50     TABLE ACCESS FULL LINK$ (cr=30 pr=0 pw=0 time=403 us)
     50     TABLE ACCESS CLUSTER USER$ (cr=3006 pr=0 pw=0 time=211075 us)
     50      INDEX UNIQUE SCAN I_USER# (cr=20 pr=0 pw=0 time=259 us)(object id 11)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  log file switch completion                      3        0.03          0.05

Note how the consistent gets in the heap execution plan (cr=26950) closely matches the consistent gets displayed for the overall operation as shown in the “Query” column (33,590). And the consistent gets from the partitioned execution plan indicate a similar amount with 29,776 buffer reads. But the consistent reads for the overall partitioned operation is 1,547,775!

As I interpret these results, the query portion of the insert is roughly the same for both insert statements (26,950 vs. 29,776 consistent gets). The only remaining part of the operation is the insert itself, indicating that far more buffer gets were needed to insert records into a partitioned table. The high value for the “Current” column indicates that the buffer gets were done so that the blocks could be modified rather than simply read. But if the same amount of data is being inserted into each table, why would more reads be needed for a partitioned target?

Partitioning imposes its own structure on a table. With a regular heap table, any free block can be used to hold any given record. But with a partitioned table, records are assigned to segments according to the partition key. So the database cannot simply reuse any block in memory; it has to use a block from the appropriate partition. If a block from the needed segment isn’t available, additional reads are necessary.

Does this mean we should avoid partitioning? Absolutely not. The benefits partitioning offers are well known and it’s unlikely that an end-user would notice this additional overhead for an individual insert. And while large inserts that span across a variety of partitions may be forced to do some additional reads, that penalty can be reduced.  Here’s how.

Consider the range.sql script below. In this script we use range partitioning instead of hash. Ten partitions are created by default on a partition ID generated by the MOD(object_id, 10) function. In this way, every record is assigned one of ten partitions. range.sql accepts the second parameter as either ‘obj’ to sort inserts by the object_id, or ‘mod’ to sort on the same function above used to generate the partition_id. Theoretically, the ‘mod’ option should allow the database to focus on a single partition at a time, where the ‘obj’ option will span inserts across all partitions.

range.sql

-- define some variables
define iterations = &1;
define orderBy    = &2;
define trace      = &3;

set echo on

-- clean up from an previous runs
DROP TABLE target_part_range;

whenever sqlerror exit failure

BEGIN
  IF '&orderby' NOT IN ('obj', 'mod') THEN
    RAISE_APPLICATION_ERROR(-20001, 'orderBy must be either obj or mod');
  END IF;
END;
/

-- create the insert target
CREATE TABLE target_part_range
PARTITION BY RANGE (partition_id) (
  PARTITION p0 VALUES LESS THAN ( 1),
  PARTITION p1 VALUES LESS THAN ( 2),
  PARTITION p2 VALUES LESS THAN ( 3),
  PARTITION p3 VALUES LESS THAN ( 4),
  PARTITION p4 VALUES LESS THAN ( 5),
  PARTITION p5 VALUES LESS THAN ( 6),
  PARTITION p6 VALUES LESS THAN ( 7),
  PARTITION p7 VALUES LESS THAN ( 8),
  PARTITION p8 VALUES LESS THAN ( 9),
  PARTITION p9 VALUES LESS THAN (10))
AS SELECT MOD(object_id, 10) partition_id, t.*
FROM dba_objects t
WHERE 1 = 2;

-- initiate trace if requested
BEGIN
  IF &trace = 1 THEN
    EXECUTE IMMEDIATE 'ALTER SESSION SET tracefile_identifier = ''part_range_&orderBy''';
    dbms_support.start_trace;
  END IF;
END;
/

set timing on

-- insert dba_objects into target as many times as requested
BEGIN
  FOR i IN 1 .. &iterations LOOP
    INSERT INTO target_part_range
    SELECT MOD(object_id, 10), t.* FROM dba_objects t
    WHERE object_id IS NOT NULL
    ORDER BY DECODE('&orderBy', 'obj', object_id, MOD(object_id, 10));

    COMMIT;
  END LOOP;
END;
/

set timing off

exec dbms_support.stop_trace

exit

Then we can run the following to display the elapsed time for each sort option:

> for orderBy in obj mod; do
>   echo $orderBy
>   for((i=1;i<=10;i++)); do
>     sqlplus / @range 10 $orderBy 0 | grep Elapsed
>   done
>   echo; echo
> done
obj
Elapsed: 00:00:44.30
Elapsed: 00:00:42.14
Elapsed: 00:00:42.26
Elapsed: 00:00:44.54
Elapsed: 00:00:43.68
Elapsed: 00:00:42.33
Elapsed: 00:00:43.88
Elapsed: 00:00:45.60
Elapsed: 00:00:43.91
Elapsed: 00:00:43.68

mod
Elapsed: 00:00:14.01
Elapsed: 00:00:13.31
Elapsed: 00:00:13.79
Elapsed: 00:00:12.97
Elapsed: 00:00:13.19
Elapsed: 00:00:13.90
Elapsed: 00:00:12.75
Elapsed: 00:00:13.80
Elapsed: 00:00:16.65
Elapsed: 00:00:15.64

As shown, ordering the inserted select statement along the same lines as the partition key greatly decreases the time required to complete the insert. An analysis of the trace files produced using the ‘obj’ and ‘mod’ options reveals the source of this improvement:

TKPROF Output For range.sql Sorted By object_id

INSERT INTO TARGET_PART_RANGE SELECT MOD(OBJECT_ID, 10), T.* FROM DBA_OBJECTS
  T WHERE OBJECT_ID IS NOT NULL ORDER BY DECODE('obj', 'obj', OBJECT_ID,
  MOD(OBJECT_ID, 10))

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute     10     49.61      50.11        627    1750050    1759464     1715840
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       11     49.63      50.12        627    1750050    1759464     1715840

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 47     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
1715840  SORT ORDER BY (cr=27460 pr=626 pw=0 time=13355410 us)
1715840   VIEW  DBA_OBJECTS (cr=27460 pr=626 pw=0 time=12032275 us)
1715840    UNION-ALL  (cr=27460 pr=626 pw=0 time=10831147 us)
1715840     FILTER  (cr=27440 pr=624 pw=0 time=4310714 us)
1720960      HASH JOIN  (cr=5450 pr=525 pw=0 time=4495567 us)
    540       TABLE ACCESS FULL USER$ (cr=30 pr=1 pw=0 time=11651 us)
1720960       TABLE ACCESS FULL OBJ$ (cr=5420 pr=524 pw=0 time=1727547 us)
  19710      TABLE ACCESS BY INDEX ROWID IND$ (cr=21990 pr=99 pw=0 time=724920 us)
  21910       INDEX UNIQUE SCAN I_IND1 (cr=20 pr=1 pw=0 time=76907 us)(object id 39)
      0     FILTER  (cr=0 pr=0 pw=0 time=28 us)
      0      NESTED LOOPS  (cr=0 pr=0 pw=0 time=0 us)
      0       TABLE ACCESS FULL LINK$ (cr=0 pr=0 pw=0 time=0 us)
      0       TABLE ACCESS CLUSTER USER$ (cr=0 pr=0 pw=0 time=0 us)
      0        INDEX UNIQUE SCAN I_USER# (cr=0 pr=0 pw=0 time=0 us)(object id 11)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                       111        0.08          0.58
  db file scattered read                         39        0.06          0.47
  latch: cache buffers chains                     2        0.00          0.00
  log file switch completion                      3        0.04          0.06
  latch free                                      1        0.00          0.00

TKPROF Output for range.sql Sorted By The Partition Key, MOD(object_id, 10)

INSERT INTO TARGET_PART_RANGE SELECT MOD(OBJECT_ID, 10), T.* FROM DBA_OBJECTS
  T WHERE OBJECT_ID IS NOT NULL ORDER BY DECODE('mod', 'obj', OBJECT_ID,
  MOD(OBJECT_ID, 10))

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute     10     15.18      14.99          1      40829      43865     1715840
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       11     15.19      15.00          1      40829      43865     1715840

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 47     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
1715840  SORT ORDER BY (cr=27460 pr=0 pw=0 time=12406659 us)
1715840   VIEW  DBA_OBJECTS (cr=27460 pr=0 pw=0 time=11504996 us)
1715840    UNION-ALL  (cr=27460 pr=0 pw=0 time=9960711 us)
1715840     FILTER  (cr=27440 pr=0 pw=0 time=3440285 us)
1720960      HASH JOIN  (cr=5450 pr=0 pw=0 time=4138855 us)
    540       TABLE ACCESS FULL USER$ (cr=30 pr=0 pw=0 time=1298 us)
1720960       TABLE ACCESS FULL OBJ$ (cr=5420 pr=0 pw=0 time=1727293 us)
  19710      TABLE ACCESS BY INDEX ROWID IND$ (cr=21990 pr=0 pw=0 time=217004 us)
  21910       INDEX UNIQUE SCAN I_IND1 (cr=20 pr=0 pw=0 time=64354 us)(object id 39)
      0     FILTER  (cr=0 pr=0 pw=0 time=29 us)
      0      NESTED LOOPS  (cr=0 pr=0 pw=0 time=0 us)
      0       TABLE ACCESS FULL LINK$ (cr=0 pr=0 pw=0 time=0 us)
      0       TABLE ACCESS CLUSTER USER$ (cr=0 pr=0 pw=0 time=0 us)
      0        INDEX UNIQUE SCAN I_USER# (cr=0 pr=0 pw=0 time=0 us)(object id 11)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                         1        0.04          0.04
  latch: cache buffers chains                     1        0.00          0.00

Again we see the consistent gets from the select statements themselves are comparable. But when sorting the insert by the partition key, the buffer reads required to insert the new rows goes down drastically; working on a single partition at a time minimizes the partition insert penalty. Ordering the insert clearly brings the inserts in-line with the same operation against a heap table. And if future queries can utilize partition pruning, and the sort remains relatively inexpensive, then paying the partition insert penalty will be worth the cost. This is especially true in write-once/read-often environments like data warehouses.

0 Responses to “Partitioning Degrades Insert Performance”



  1. No Comments Yet

Leave a Reply