Thursday, May 24, 2012

Oracle Scratchpad » Logical tuning

Logical tuning:
Here’s a model of a problem I solved quite recently at a client site. The actual query involved was much more complex and the volume of data much larger, but this tiny, two table, example was enough to demonstrate the key principle. (Originally I thought I’d have to use three tables to model the problem, which is why you my be puzzled by my numerically odd choice of table names). I ran this example on 11.2.0.2 – the client version:
create table t2
as
select
rownum id2,
trunc(dbms_random.value(0,1000)) n2a,
trunc(dbms_random.value(0,1000)) n2b,
lpad(rownum,6,'0') vc2,
lpad('x',100,'x') padding
from
all_objects
where
rownum ;

alter table t2 add constraint t2_pk primary key(id2, n2a);

create table t3
as
select
rownum id3,
trunc(dbms_random.value(0,1000)) n3a,
trunc(dbms_random.value(0,1000)) n3b,
lpad(rownum,6,'0') vc3,
lpad('x',100,'x') padding
from
all_objects
where
rownum ;

alter table t3 add constraint t3_pk primary key(n3a, n3b, id3);

-- now collect stats on the table and execute this query (with autotrace enabled)

select
*
from
t2
where
not exists (
select /*+ unnest */
null
from t3
where n3a = n2a
and n3b = n2b
and (id3 = id2 or id3 = id2 + 1000)
)
;

select * from table(dbms_xplan.display);


You’ll note that I’ve included the /*+ unnest */ hint in the subquery because I want Oracle to run this as a hash anti-join; and it would appear to be legal (and simple) to do this given the various not null contraints and primary key information. In fact the plan uses a filter subquery:
------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5000 | 585K| 5015 |
|* 1 | FILTER | | | | |
| 2 | TABLE ACCESS FULL| T2 | 5000 | 585K| 15 |
|* 3 | INDEX RANGE SCAN | T3_PK | 1 | 12 | 2 |
------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT /*+ UNNEST */ 0 FROM "T3" "T3" WHERE
"N3B"=:B1 AND "N3A"=:B2 AND ("ID3"=:B3 OR "ID3"=:B4+1000)))
3 - access("N3A"=:B1 AND "N3B"=:B2)
filter("ID3"=:B1 OR "ID3"=:B2+1000)


I suspect that the optimizer code bypasses the anti-join because of the (carefully bracketed) disjunct (OR) predicate. On the client site this resulted in the subquery being executed 9 million times, reducing an intermediate data set from 9M rows to 2M rows at a cost of 27 million buffer visits and about 60 CPU seconds. Fortunately I was able to dredge up a little bit of propositional calculus and quote the following equivalence:
not( A or B )  (not A and not B)

Equally fortunately I didn’t have to worry about three-valued logic (all relevant columns were declared not null), so I was able to rewrite the query in the form:
select
*
from
t2
where
not exists (
select null
from t3
where n3a = n2a
and n3b = n2b
and id3 = id2
)
and not exists (
select null
from t3
where n3a = n2a
and n3b = n2b
and id3 = id2 + 1000
)
;


With this code Oracle did two unnests and converted to hash anti-joinsin both cases (at least, that’s what happened on the client site – my small sample switched to nested loop anti-joins):
-------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 7200 | 15 |
| 1 | NESTED LOOPS ANTI | | 50 | 7200 | 15 |
| 2 | NESTED LOOPS ANTI | | 4999 | 644K| 15 |
| 3 | TABLE ACCESS FULL| T2 | 5000 | 585K| 15 |
|* 4 | INDEX UNIQUE SCAN| T3_PK | 1 | 12 | |
|* 5 | INDEX UNIQUE SCAN | T3_PK | 5000 | 60000 | |
-------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("N3A"="N2A" AND "N3B"="N2B" AND "ID3"="ID2"+1000)
5 - access("N3A"="N2A" AND "N3B"="N2B" AND "ID3"="ID2")


The change to hash anti-joins was a huge benefit (the nested loop anti-join would have improved things for the client to a degree, but there’s not really an enormouse difference in some cases between a filter subquery and an equivalent nested loop anti/semi-join). In this case the query run time dropped from 95 seconds to 27 seconds – all of it CPU time.


DIGITAL JUICE

No comments:

Post a Comment

Thank's!