New White Paper on 10053 Tracing

I presented my latest white paper at the 2009 Rocky Mountain Oracle Users Group Training Days Conference yesterday. Called “Using the 10053 Event To Understand Index Behavior”, I discussed how the 10053 event dumps out the statistics and calculations used by the Cost-Based Optimizer (CBO) to develop execution plans.  We can use the resulting trace file to gauge what effect specific initialization parameters and optimizer statistics have on the CBO’s decisions.  We can even diagnose those decisions when they result in suboptimal plans.

Follow these links for copies of the presentation and paper.

The Danger of Stale Statistics on Pre-Created Partitions

DBAs often pre-create partitions, anticipating their need in the near (or sometimes not-so-near) future. Mostly driven out of fear of an end-user receiving the dreaded “ORA-14400: inserted partition key does not map to any partition” error, this practice appears proactive and conscientious. But when a pre-created partition gets suddenly populated, Oracle can develop seriously flawed execution plans based on the expectation that the partition is still empty.

Gathering optimizer statistics after large numbers of inserts, deletes, or updates is always the best practice.  Doing so gives the cost-based optimizer (CBO) the best chance for developing the right execution plan given the recent DML activity.  But many IT shops rely solely on scheduled jobs to gather optimizer statistics.  While generally adequate, this practice leaves databases vulnerable to developing inappropriate execution plans after large changes to the data but before the stats-gathering job runs.

This effect is especially problematic for tables with pre-created partitions since they’re identified as empty in the data dictionary.  Compare this with newly created partitions or tables: default dynamic sampling kicks in for objects like these and gives the optimizer a fighting chance to come up with the right plan.  But pre-created partitions already have statistics on them, statistics that indicate the partition is empty.

This exempts pre-created partitions from the benefits of dynamic sampling.  And the partition’s low (that is, non-existent) volume means that access paths for that partition typically favor nested loop joins and index-driven access paths.  For low-volume segments, this sort of plan makes sense since the optimizer expects very few records to be returned.  But using this sort of plan for tables with high volumes is actually very costly.  This can occur during those periods after the empty partition is loaded but before its analyzed.

As an example of how situations like these can go wrong, consider the plight of a former client.  They had a weekly partitioning scheme, with partitions beginning at 12:00 AM each Friday morning.  What’s more, they decided to pre-create all the partitions they’d need for following year.  Their daily stats gathering job kicked off at 11:00 AM but their highest volume loads occurred between 10:00 PM and 2:00 AM.  In fact, because of the nature of their business, about 60% of the data contained in each partition was inserted during this window on Friday and Saturday nights.

Executives logged into the system every Saturday morning to check reports, only to find their client applications unresponsive.  The poor performance continued until roughly 11:15 AM, when the stats gathering job completed.  What’s more, the executives all logged in around 8:00 AM and ran the same report.  Individually, each session performed badly.  But en masse the latch contention and consequent load on the system skyrocketed, starving the already struggling sessions for their desperately needed CPUs.

In this case, the optimizer treated the newly populated partition as if it were empty, which it was when stats were most recently gathered.  Without realizing the new volume in the recently loaded partition, the CBO ignored the full-scan/hash join option and went down the more costly index scan/nested loop path.  Consider the example below, which simulates a pre-created partition getting populated and queried before gathering fresh stats.  The example begins by creating a partitioned data table along with a table contataining reference data.

create.sql

whenever sqlerror exit failure
set echo on

-- clean up from previous runs
DROP TABLE part_data;
DROP TABLE owner_lookup;

-- create reference table
CREATE TABLE owner_lookup AS
SELECT object_id, owner
FROM dba_objects
WHERE 1 = 2;

ALTER TABLE owner_lookup
ADD CONSTRAINT owner_lookup_pk
PRIMARY KEY (object_id);

-- create partitioned data table
CREATE TABLE part_data
PARTITION BY RANGE (part_key) (
  PARTITION p0 VALUES LESS THAN (1),
  PARTITION p1 VALUES LESS THAN (2),
  PARTITION p2 VALUES LESS THAN (3)) AS
SELECT
  MOD(ROWNUM, 1)  part_key, t.*
FROM dba_objects t
WHERE 1 = 2;

CREATE INDEX part_data_idx01
ON part_data(object_id);

ALTER TABLE part_data
ADD CONSTRAINT part_data_fk01
FOREIGN KEY (object_id) REFERENCES owner_lookup(object_id);

BEGIN
  -- populate reference table
  INSERT INTO owner_lookup
  SELECT object_id, owner
  FROM dba_objects
  WHERE object_id IS NOT NULL;

  -- populate p1 and p2, but leave p3 empty
  -- p3 serves as the pre-created partition
  FOR i IN 1 .. 10 LOOP
    INSERT INTO part_data
    SELECT
      MOD(ROWNUM, 2)  part_key, t.*
    FROM dba_objects t
    WHERE object_id IS NOT NULL;

    COMMIT;
  END LOOP;
END;
/

-- gather stats
BEGIN
  dbms_stats.gather_table_stats(
    user,
    'part_data',
    cascade => TRUE,
    estimate_percent => 100,
    no_invalidate => FALSE,
    method_opt => 'FOR ALL COLUMNS SIZE 1');

  dbms_stats.gather_table_stats(
    user,
    'owner_lookup',
    cascade => TRUE,
    estimate_percent => 100,
    no_invalidate => FALSE,
    method_opt => 'FOR ALL COLUMNS SIZE 1');
END;
/

set echo off

Note that the script gathers stats for both tables at the end, so the optimizer “knows” p2 is empty.  When we query against p1 and p2 by filtering for their partition keys we see very different execution plans.  The query script below accepts the partition key as its first argument:

query.sql

set lines 128 pages 128
set autotrace on

SELECT COUNT(*)
FROM part_data p, owner_lookup o
WHERE p.object_id = o.object_id
  AND p.part_key = &1
  AND o.owner = 'SYSTEM';

set autotrace off

With autotrace enabled, we can see the different execution plans and execution statistics from querying p1 and p2:

Querying p1:

SQL> @query 1
SQL>
SQL> SELECT COUNT(*)
  2  FROM part_data p, owner_lookup o
  3  WHERE p.object_id = o.object_id
  4    AND p.part_key = &1
  5    AND o.owner = 'SYSTEM';
old   4:   AND p.part_key = &1
new   4:   AND p.part_key = 1

  COUNT(*)
----------
      2260

Execution Plan
----------------------------------------------------------
Plan hash value: 554393664

---------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |              |     1 |    15 |   148   (3)| 00:00:02 |       |       |
|   1 |  SORT AGGREGATE          |              |     1 |    15 |            |          |       |       |
|*  2 |   HASH JOIN              |              | 13451 |   197K|   148   (3)| 00:00:02 |       |       |
|*  3 |    TABLE ACCESS FULL     | OWNER_LOOKUP |  1345 | 10760 |     6   (0)| 00:00:01 |       |       |
|   4 |    PARTITION RANGE SINGLE|              | 47080 |   321K|   141   (3)| 00:00:02 |     2 |     2 |
|*  5 |     TABLE ACCESS FULL    | PART_DATA    | 47080 |   321K|   141   (3)| 00:00:02 |     2 |     2 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("P"."OBJECT_ID"="O"."OBJECT_ID")
   3 - filter("O"."OWNER"='SYSTEM')
   5 - filter("P"."PART_KEY"=1)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        649  consistent gets
          0  physical reads
          0  redo size
        412  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Querying p2:

SQL> @query 2
SQL> set lines 128 pages 128
SQL> set autotrace on
SQL> set echo on
SQL>
SQL> SELECT COUNT(*)
  2  FROM part_data p, owner_lookup o
  3  WHERE p.object_id = o.object_id
  4    AND p.part_key = &1
  5    AND o.owner = 'SYSTEM';
old   4:   AND p.part_key = &1
new   4:   AND p.part_key = 2

  COUNT(*)
----------
         0

Execution Plan
----------------------------------------------------------
Plan hash value: 1085628044

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |     1 |    34 |     2   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE               |                 |     1 |    34 |            |          |       |       |
|   2 |   NESTED LOOPS                |                 |     1 |    34 |     2   (0)| 00:00:01 |       |       |
|   3 |    PARTITION RANGE SINGLE     |                 |     1 |    26 |     2   (0)| 00:00:01 |     3 |     3 |
|*  4 |     TABLE ACCESS FULL         | PART_DATA       |     1 |    26 |     2   (0)| 00:00:01 |     3 |     3 |
|*  5 |    TABLE ACCESS BY INDEX ROWID| OWNER_LOOKUP    |     1 |     8 |     0   (0)| 00:00:01 |       |       |
|*  6 |     INDEX UNIQUE SCAN         | OWNER_LOOKUP_PK |     1 |       |     0   (0)| 00:00:01 |       |       |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("P"."PART_KEY"=2)
   5 - filter("O"."OWNER"='SYSTEM')
   6 - access("P"."OBJECT_ID"="O"."OBJECT_ID")

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        410  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

As shown, Oracle took the nested loop approach for the empty partition but decided to full scan and hash join the populated one.  The populated partition required 649 consistent gets.  But when we populate p2 with the same data in p1 using the insert.sql script below, look what happens to the plan and consistent gets when re-querying p2.

insert.sql

INSERT INTO part_data
SELECT
  2, /* remaps data to p2 */
  owner,
  object_name,
  subobject_name,
  object_id,
  data_object_id,
  object_type,
  created,
  last_ddl_time,
  timestamp,
  status,
  temporary,
  generated,
  secondary
FROM part_data
WHERE part_key = 1 /* limits data to contents of p1 */
/

COMMIT;

Re-querying p2 with stale statistics:

SQL> @insert

47080 rows created.

Commit complete.

SQL> @query 2
old   4:   AND p.part_key = &1
new   4:   AND p.part_key = 2

  COUNT(*)
----------
      2260

Execution Plan
----------------------------------------------------------
Plan hash value: 1085628044

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |     1 |    34 |     2   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE               |                 |     1 |    34 |            |          |       |       |
|   2 |   NESTED LOOPS                |                 |     1 |    34 |     2   (0)| 00:00:01 |       |       |
|   3 |    PARTITION RANGE SINGLE     |                 |     1 |    26 |     2   (0)| 00:00:01 |     3 |     3 |
|*  4 |     TABLE ACCESS FULL         | PART_DATA       |     1 |    26 |     2   (0)| 00:00:01 |     3 |     3 |
|*  5 |    TABLE ACCESS BY INDEX ROWID| OWNER_LOOKUP    |     1 |     8 |     0   (0)| 00:00:01 |       |       |
|*  6 |     INDEX UNIQUE SCAN         | OWNER_LOOKUP_PK |     1 |       |     0   (0)| 00:00:01 |       |       |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("P"."PART_KEY"=2)
   5 - filter("O"."OWNER"='SYSTEM')
   6 - access("P"."OBJECT_ID"="O"."OBJECT_ID")

Statistics
----------------------------------------------------------
          2  recursive calls
          1  db block gets
      95338  consistent gets
          0  physical reads
      39056  redo size
        412  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

The nested loop plan gets reused but generates 144 times more consistent gets (95338 vs. 649) than the full scan/hash join plan! No wonder the executives were mad! On top of the excessive CPU required to access the buffer cache, the users also queued up behind each other while waiting for the cache buffer chain latch, the latch that allows a user session to examine a data block in memory.  All this led to absurdly high load averages on the machine and abysmal response times.  Remember that that the stats-gathering job ran after users were already querying the data?  Rather than taking my recommendation to gather stats immediately after the load, the client elected to simply move stats job from 11 AM to 2 AM.  This narrowed the window but did not close it.  Nevertheless, the reports ran well when they needed to because the new stats let Oracle make the right decision.

All this points to the vulnerabilty that precreated partitions present.  The risk of bad execution plans always exists when optimizer statistics fail to represent the data that tables contain.  This condition prevents the CBO from making the right choice.  But queries against pre-created partitions are especially sensitive to this effect since their stats represent them as empty.  Once loaded, the partition statistics are wildly wrong (as opposed to a just little off) and the example above demonstrates the potential consequences.  Large DML changes demand new optimizer statistics in all cases, but especially for pre-created partitions.

Next Page »