A client recently downgraded from Oracle’s 11.1.0.6 RDBMS to 10.2.0.4 because of performance issues. Their application inserted hundreds of millions of records extracted from a legacy mainframe system inside a PL/SQL cursor for-loop. The client took this approach since each record required a number of transformations and that logic could be easily applied within the loop. The two databases where we tested their code differed only by their version, yet the 11g instance took 33% longer to perform the same operations.
Both databases ran on the same 64-bit Linux (RHEL5) machine and had comparable memory resources allocated to them. (I’d be very interested to hear if you can reproduce the results below, especially if you’re on a different operating system.) Yet 10g consistently outperformed 11g, at least for the operations in question.
The script I used to demonstrate the issue follows. It creates a very simple test case and activates Oracle’s extended SQL trace, a diagnostic feature which records the database session’s activity and how much time each action took.
Performance Test Script:
whenever sqlerror exit failure
-- get the version to identify trace files
col file_id for a13 new_value file_id
SELECT
'perf_test_' || SUBSTR(version, 1, 2) || 'g' file_id
FROM v$instance;
spool &file_id..log
-- create a test user (note that we designated a
-- default tablespace for this database when it
-- was created so that's not specified below)
CREATE USER perf_test IDENTIFIED BY perf_test;
GRANT
ALTER SESSION,
CREATE SESSION,
CREATE TABLE,
SELECT ANY DICTIONARY,
UNLIMITED TABLESPACE
TO perf_test;
connect perf_test/perf_test
-- create a source table
CREATE TABLE source AS
SELECT *
FROM dba_objects
WHERE ROWNUM <= 5000;
-- create a target table
CREATE TABLE target AS
SELECT *
FROM dba_objects
WHERE 1 = 2;
-- initiate trace and timing
ALTER SESSION SET statistics_level = ALL;
ALTER SESSION SET "_rowsource_execution_statistics" = TRUE;
ALTER SESSION SET tracefile_identifier = '&file_id';
ALTER SESSION SET events =
'10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';
set timing on
set echo on
-- populate target from source
BEGIN
-- create 1000000 records in the target
-- from the 5000 already in the source
FOR i IN 1 .. 1000000 / 5000 LOOP
FOR rec IN (
SELECT * FROM source) LOOP
INSERT INTO target VALUES rec;
END LOOP;
END LOOP;
END;
/
set echo off
set timing off
connect /
DROP USER perf_test CASCADE;
exit
As shown, the script
- creates a user to run the tests,
- logs in,
- creates the source and target tables,
- enables an extended SQL trace (using the 10046 event),
- loops through the source, and
- inserts into the target.
Initially I gathered the insert timings without the 10046 trace. In that case, 10g ran the PL/SQL block above in 30 seconds; 11g took 40 seconds. After enabling the trace, 10g took 40 seconds and 11g took 70 seconds! Why the difference?
We might forgive some of the tracing overhead since the same insert is being executed a million times. This emits a million “EXEC #d+” records into the trace file, which grew to 70mb during the 10g test. The 11g test, however, created a trace file almost 5 times larger at 330mb. Again, why the difference?
Trace files of any size are difficult to interpret, so Oracle supplies a utility called TKPROF to provide a summary of the activity contained within them. Several clues leaped out when comparing the output. First review these snippets from the 10g TKPROF output:
10g TKPROF Output:
BEGIN
-- create 1000000 records in the target
-- from the 5000 already in the source
FOR i IN 1 .. 1000000 / 5000 LOOP
FOR rec IN (
SELECT * FROM source) LOOP
INSERT INTO target VALUES rec;
END LOOP;
END LOOP;
END;
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 18.15 18.76 0 0 0 1
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 18.15 18.77 0 0 0 1
.
.
.
INSERT INTO TARGET
VALUES
(:B1 ,:B2 ,:B3 ,:B4 ,:B5 ,:B6 ,:B7 ,:B8 ,:B9 ,:B10 ,:B11 ,:B12 ,:B13 )
call count cpu elapsed disk query current rows
------- ------- -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1000000 18.18 16.99 0 3050 1020489 1000000
Fetch 0 0.00 0.00 0 0 0 0
------- ------- -------- ---------- ---------- ---------- ---------- ----------
total 1000001 18.18 16.99 0 3050 1020489 1000000
.
.
.
1 session in tracefile.
4 user SQL statements in trace file.
369 internal SQL statements in trace file.
373 SQL statements in trace file.
11 unique SQL statements in trace file.
1013912 lines in trace file.
38 elapsed seconds in trace file.
But the same sections from the 11g trace file indicates this:
11g TKPROF Output:
BEGIN
-- create 1000000 records in the target
-- from the 5000 already in the source
FOR i IN 1 .. 1000000 / 5000 LOOP
FOR rec IN (
SELECT * FROM source) LOOP
INSERT INTO target VALUES rec;
END LOOP;
END LOOP;
END;
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 45.98 46.46 0 0 0 1
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 45.98 46.46 0 0 0 1
.
.
.
INSERT INTO TARGET
VALUES
(:B1 ,:B2 ,:B3 ,:B4 ,:B5 ,:B6 ,:B7 ,:B8 ,:B9 ,:B10 ,:B11 ,:B12 ,:B13 ,:B14 ,
:B15 )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 999996 18.53 21.39 0 3247 1021169 999996
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 999997 18.53 21.39 0 3247 1021169 999996
.
.
.
1 session in tracefile.
1000202 user SQL statements in trace file.
328 internal SQL statements in trace file.
1000530 SQL statements in trace file.
13 unique SQL statements in trace file.
5016224 lines in trace file.
70 elapsed seconds in trace file.
What?!?! Several oddities can be see when comparing this output. First of all, the same code was executed on both systems, yet the 11g trace recorded 1,000,000 more user SQL statements than 10g. Also, the insert statement in 11g is missing four executions, showing only 999,996 instead of 1,000,000. Curiously, the other four can be found elsewhere in the TKPROF output, as singleton queries that should have been aggregated together with the others. And if we dig into 11g’s raw trace file, we find 1,000,000 statements like this:
PARSING IN CURSOR #3 len=109 dep=1 uid=105 oct=2 lid=105 tim=1222272480471544 hv=2621435192 ad='cb846118' sqlid='25ts65qf3zv9s' INSERT INTO TARGET VALUES (:B1 ,:B2 ,:B3 ,:B4 ,:B5 ,:B6 ,:B7 ,:B8 ,:B9 ,:B10 ,:B11 ,:B12 ,:B13 ,:B14 ,:B15 ) END OF STMT
The comparable statement in 10g appears only once. Note that the parse count for the insert statement in both traces is one. But it seems that each insert gets treated as a separate statement for the “user SQL statements” metric because of the repeated messages like the one above. What’s more, twice as much time is spent on CPU for 11g during the execution of the PL/SQL block itself as compared with 10g. So while the actual parse count is sensible (as measured by “PARSE #d+” records in the trace), each insert appears as if it’s still getting re-evaluated in some way by 11g.
I considered that this behavior might be an artifact of bind variable peeking or cursor sharing gone astray. To that end, I tried toggling each of the following session parameters:
- CURSOR_SHARING: I tried FORCE, EXACT, and SIMILAR, none of which had an effect
- SESSION_CACHED_CURSORS: I ran the test using values of 50 and 0, the latter doubled the runtime for both 10g and 11g
- _OPTIM_PEEK_USER_BINDS: I tried both TRUE and FALSE for this parameter; no effect
- _OPTIMIZER_ADAPTIVE_CURSOR_SHARING: Also ran with both TRUE and FALSE; also no effect
- OPTIMIZER_FEATURES_ENABLE: I even tried setting this one back to 10.2.0.4 for the 11g database; again, no effect
In the end, nothing I tried could dissuade 11g from emitting the “PARSING IN CURSOR #d+” message for each insert statement. I filed a service request with Oracle Support on this issue and they ultimately referred the issue to development as a bug. Note that Support couldn’t reproduce the slowness I was seeing, but their trace files did reflect the parsing messages I observed. (I’ll post a follow-up if they provide a solution.) Please post a comment if you have another suggestion on how to address this issue. While I can’t pretend to know what’s going on with the internals here, I can say that my client needed the best throughput they could get and the downgrade bought them an extra 33%.
Great timely article. I was hunting around to see if there were positive 11g experiences particularly with respect to bind variable peeking improvements. It is interesting to see ORACLE have just added a new set of bugs with a new set of features. Unbelievable given the revenue of this company and the potential. There’s clear reason as to why ORACLE’s competitors are making ground. We’ve run into a couple of bugs from 10.2.0.3 and looking at the fixes in 10.2.0.4 is incredible.
I am encountering what appears to be a similar issue. I am noticing a slowdown in inserts without index, index creation, inserts with an index, and in deletes.
I have opened an SR with Oracle.
SR 7373948.994: INSERTS AND INDEX CREATION SLOWER IN 11G THAN10G
I have been trying to refer them to your SR and Bug but they do not seem able to find it. Would you mind sharing your SR number and bug number?
Here is my script that I have been using:
alter session disable parallel query;
set termout on;
prompt ** Nested loop speed test
set timing on;
prompt ———————–
prompt ** Drop table if it exists
drop table a;
prompt ———————–
prompt ** Create table
create table a
(id1 number not null,
id2 number not null,
id3 number not null)
;
prompt ———————–
prompt ** Put data in the table
begin
for i in 1..10000000 loop
insert into a(id1, id2, id3)
values (i, i, i);
end loop;
commit;
end;
/
prompt ———————–
prompt ** Create an index
create unique index a on a(id1, id3);
prompt ———————–
prompt ** Put statistics on the table and index
begin
dbms_stats.gather_table_stats(
user,
‘a’,
cascade=>true,
estimate_percent=>100,
method_opt=>’FOR ALL COLUMNS SIZE 254′,
no_invalidate=> false);
end;
/
prompt ———————–
prompt ** Run the test query three times
prompt ** First run
select /*+ use_nl(A B) */count(a.id3)
from a a, a b
where a.id1=b.id2;
prompt ———————–
prompt ** Second run
select /*+ use_nl(A B) */count(a.id3)
from a a, a b
where a.id1=b.id2;
prompt ———————–
prompt ** Third run
select /*+ use_nl(A B) */count(a.id3)
from a a, a b
where a.id1=b.id2;
prompt ———————–
prompt ** Delete from plan table
delete from plan_table where statement_id = ‘Nested Loop Speed Test’;
prompt ———————–
prompt ** explain plan
explain plan set statement_id = ‘Nested Loop Speed Test’ for
select /*+ use_nl(A B) */count(a.id3)
from a a, a b
where a.id1=b.id2;
prompt ———————–
prompt ** Query plan table
select operation
from plan_table p
where p.statement_id = ‘Nested Loop Speed Test’
and p.operation in (‘NESTED LOOPS’, ‘HASH JOIN’);
prompt ———————–
prompt ** Delete data from the table
delete from a;
commit;
prompt ———————–
prompt ** Put data in the table with index
begin
for i in 1..10000000 loop
insert into a(id1, id2, id3)
values (i, i, i);
end loop;
commit;
end;
/
prompt ———————–
prompt ** Update table
update a set id1 = id1;
commit;
Hi Joe,
The SR was 7013191.994 (10046 LEVEL 8 TRACE EMITS “PARSING IN CURSOR #” RECORDS FOR EVERY SQL EXECUTION) and it references bug number 7309111. The client was happy with the downgrade solution so I didn’t pursue the SR much after that. I’d love to hear if you get any further with the issue.
Best Regards,
Dave
I have installed 11g on windows server and facing same issue, will post in detail in my next message.
Rgds
Abhay.