Friday, July 13, 2012

PK Problem

PK Problem:
I have been saying for a very long time (probably since some time in the last century) that if you want to add a primary key (or unique) constraint to a table where there might be some duplicate data then one of the best strategies for doing so might be to create a non-unique index (with the online option), then add the constraint in the state enable novalidate, and then validate the constraint. For example:
create table t1
as
select
*
from
all_objects
;

create index t1_i1 on t1(object_id) online;
-- collect stats
alter table t1 add constraint t1_pk primary key(object_id) enable novalidate;
alter table t1 modify constraint t1_pk validate;


The theory was simple – the online build would require only a brief period of locking, the addition of the constraint would require only a brief period of locking, after which all new (or modified) data would be checked for uniqueness; and then the validation would simply walk the index in order checking for duplicates. If you wanted you could always choose the option to list the rowids of duplicates into an exceptions table.
In the last 48 hours it has been brought to my attention that that last assumption is, and perhaps has always been, wrong. I’ve just been running some tests of the mechanism and checked what Oracle does about the validation step all the way back to 8.1.7.4.
Depending on whether the constraint is single column or multi-column, depending on whether you’re adding a unique or primary key constraint, depending on whether the column(s) are pre-declared as NOT NULL, and depending on exact version of Oracle, there are several slightly different scenarios to review if you want to do an exhaustive analysis; but here’s the SQL, with execution plan, that Oracle ran for the validate step in the example above on 11.2.0.3:
select /*+ all_rows ordered */
A.rowid, :1, :2, :3
from
TEST_USER.T1 A,
(
select /*+ all_rows */
OBJECT_ID
from
TEST_USER.T1 A
where (OBJECT_ID is not null)
group by
OBJECT_ID
having
count(1) > 1
) B
where
(A.OBJECT_ID = B.OBJECT_ID)
union all
select
/*+ all_rows ordered */
A.rowid, :1, :2, :3
from
TEST_USER.T1 A
where
(OBJECT_ID is null)
;

-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2706 | 81167 | | 149 (3)| 00:00:02 |
| 1 | UNION-ALL | | | | | | |
|* 2 | HASH JOIN | | 2705 | 81150 | 1536K| 149 (3)| 00:00:02 |
| 3 | INDEX FAST FULL SCAN | T1_I1 | 54081 | 897K| | 34 (0)| 00:00:01 |
| 4 | VIEW | | 2705 | 35165 | | 36 (6)| 00:00:01 |
|* 5 | FILTER | | | | | | |
| 6 | HASH GROUP BY | | 2705 | 13525 | | 36 (6)| 00:00:01 |
| 7 | INDEX FAST FULL SCAN| T1_I1 | 54081 | 264K| | 34 (0)| 00:00:01 |
|* 8 | FILTER | | | | | | |
| 9 | INDEX FAST FULL SCAN | T1_I1 | 54081 | 897K| | 34 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
5 - filter(COUNT(*)>1)
8 - filter(NULL IS NOT NULL)


A couple of background points to note: this execution plan came from “explain plan” and a call to dbms_xplan.display, but it does match the run-time plan as reported in the trace file. The “intent” of the code is to identify all the rowids for rows where the same object_id has been used more than once (if there are none then the PK is valid). Because of the standard optimizer arithmetic for “having count(*) > {const}”, the optimizer assumes that duplicates account for 5% of the data. The second query block in the union all allows Oracle to capture rows where the primary key can’t be valid because the column is null – but this is a redundant check in my example because the column is defined to be not null; however if you check line 8 of the execution plan you can see that there is a FILTER operation that means that part of the query will be short-circuited and do no work. The three bind variables in the main select hold the table owner, table name, and constraint name – as required by the exceptions table (not that we’ve asked Oracle to capture exceptions in this example).
I think we can easily explain the redundancy in the code by assuming that the developer who wrote it decided to maximise code reuse; but why is there an ordered hint in it that (surely) maximises the inefficiency of the code ? The resulting execution plan for any large index is almost fixed by that hint. Oracle is going to build an in-memory (if it fits) hash table of every entry in the index, then do a massive aggregation job to generate the data to probe the hash table.  As a consequence, it’s quite likely that we will end up doing the equivalent for dumping the whole index into the temporary tablespace twice – once for the hash table, once to do the aggregation, and this might be something we’d like to avoid.
But if we’re hoping to enable a primary key constraint aren’t we expecting a very small volume of data (i.e. small number of duplicates) in the aggregate, and wouldn’t it make sense to do a nested loop driven by that small number, rather than a hash join; and since we have a suitable index in place could we generate the aggregate without doing any sorting ?  In other orders can we avoid doing any I/O to the temporary tablespace and produce a (pseudo-)plan like:
NESTED LOOP
Aggregated object_id having count(*) > 1
Index range scan t1_i1

Since the code is embedded in a package we can’t change it but, for very special (extremely large) cases, we might decide that we need to change the execution plan for the query – so we might look at the option for creating an SQL Plan Baseline for a modified version of the query, and then attaching that plan to this query. Let’s work on that idea in steps – first we’ll just try aggregating the object ids by walking the index in order:
select * from (
select /*+ index(t1(object_id)) */
object_id
from
t1
where
object_id is not null
group by
object_id
having
count(1) > 1
)
;
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2705 | 35165 | 121 (0)| 00:00:02 |
| 1 | VIEW | | 2705 | 35165 | 121 (0)| 00:00:02 |
|* 2 | FILTER | | | | | |
| 3 | SORT GROUP BY NOSORT| | 2705 | 13525 | 121 (0)| 00:00:02 |
| 4 | INDEX FULL SCAN | T1_I1 | 54081 | 264K| 121 (0)| 00:00:02 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(COUNT(*)>1)

In this plan we walk the index in order, keeping a running count for each object_id, discarding any ids where the count is just 1. Note particulary that the aggregation step is “sort group by nosort” – we DON’T sort the data. In the absence of the index() hint my system chose to do an “index fast full scan” with “hash group by” – it’s up to you to decide which of those two plans is likely to be the more efficient way to generate the small number of exceptions in your case.
Once we have this starting step we can try to use it as the first step of a nested loop (since we believe it will return a handful of rows rather than the 5% indicated by the optimizer.
select
--+ leading(b a) use_nl(a) index(a(object_id)) no_merge(b)
a.rowid
from
(
select /*+ index(t1 (object_id)) */
object_id
from
t1
where
object_id is not null
group by
object_id
having
count(1) > 1
)
b ,
t1 a
where
a.object_id = b.object_id
;

---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2705 | 81150 | 121 (0)| 00:00:02 |
| 1 | NESTED LOOPS | | 2705 | 81150 | 121 (0)| 00:00:02 |
| 2 | VIEW | | 2705 | 35165 | 121 (0)| 00:00:02 |
|* 3 | FILTER | | | | | |
| 4 | SORT GROUP BY NOSORT| | 2705 | 13525 | 121 (0)| 00:00:02 |
| 5 | INDEX FULL SCAN | T1_I1 | 54081 | 264K| 121 (0)| 00:00:02 |
|* 6 | INDEX RANGE SCAN | T1_I1 | 1 | 17 | 0 (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(COUNT(*)>1)
6 - access("A"."OBJECT_ID"="B"."OBJECT_ID")


This is exactly what we want – as each row comes out of the aggregate rowsource at line 2, we access the index to get the details that we need of the multiple rows (just the rowids) – and the relevant index leaf block is the one we’ve just walked to generate the aggregate – so no further random I/O needed. This HAS to be pretty efficient.
There’s just one problem – I lied.
I’ve hacked the plan by hand; when you embed the previous query as an in-line view the nosort option disappears. Line 4 should read “SORT GROUP BY” – I edited in the nosort by hand – Oracle actually sorts the entire result set before starting the nested loop; in fact it’s slightly worse than that because by default the optimizer chooses a “HASH GROUP BY”, which I blocked by setting the hidden parameter _gby_hash_aggregation_enabled to false. Looking at the costing, by the way, it struck me that it was just possible that Oracle was doing a nosort sort – the incremental cost was zero – so I enabled the 10046 and 10032 trace so show that Oracle wrote the result set to the temporary tablespace, then read it back, and did actualy record the expected number of records sorted.
I couldn’t find any way to avoid this undesirable sort by hinting – so that seems to have scuppered our plans for using an SQL Plan Baseline to eliminate the large volume of writes to the temporary tablespace (note, we might still be happy to do an index fast full scan at this point with a hash group by – it should be better than the default plan because it will only dump the content of the index to the temporary tablespace once.)
But our plans might not be completely wrecked. It is possible to get the intermediate result set with a nosort and then use it as follows:
with b as (
select /*+ materialize index(t1(object_id)) */
object_id
from
t1
where
object_id is not null
group by
object_id
having
count(1) > 1
)
select
--+ leading(b a) use_nl(a) index(a(object_id))
a.rowid
from
b ,
t1 a
where
a.object_id = b.object_id
;

----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2705 | 81150 | 123 (0)| 00:00:02 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6608_4017FEF5 | | | | |
|* 3 | FILTER | | | | | |
| 4 | SORT GROUP BY NOSORT | | 2705 | 13525 | 121 (0)| 00:00:02 |
| 5 | INDEX FULL SCAN | T1_I1 | 54081 | 264K| 121 (0)| 00:00:02 |
| 6 | NESTED LOOPS | | 2705 | 81150 | 2 (0)| 00:00:01 |
| 7 | VIEW | | 2705 | 35165 | 2 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6608_4017FEF5 | 2705 | 13525 | 2 (0)| 00:00:01 |
|* 9 | INDEX RANGE SCAN | T1_I1 | 1 | 17 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(COUNT(*)>1)
9 - access("A"."OBJECT_ID"="B"."OBJECT_ID")

If we can rewrite the SQL to use subquery factoring and then materialize the subquery (which should only produce a small temporary table) we can get the nosort back and get the nested loop – and I didn’t even have to play with a hidden parameter to do it. But how do we rewrite the SQL that’s generated by code embedded in a package ? First of all, here’s the SQL (with plan) I would like to use – I’ve eliminated the original hints, introduced a subquery factoring clause with materialization and an index hint:
with b as (
select /*+ materialize index(t1 (object_id)) */
object_id
from
t1
where
object_id is not null
group by
object_id
having
count(1) > 1
)
select
A.rowid, :1, :2, :3
from
t1 A,
B
where
(a.object_id = b.object_id)
union all
select
a.rowid, :1, :2, :3
from
t1 A
where
object_id is null
;

----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2706 | 81167 | 2 (0)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D660D_4017FEF5 | | | | |
|* 3 | FILTER | | | | | |
| 4 | SORT GROUP BY NOSORT | | 2705 | 13525 | 121 (0)| 00:00:02 |
| 5 | INDEX FULL SCAN | T1_I1 | 54081 | 264K| 121 (0)| 00:00:02 |
| 6 | UNION-ALL | | | | | |
| 7 | NESTED LOOPS | | 2705 | 81150 | 2 (0)| 00:00:01 |
| 8 | VIEW | | 2705 | 35165 | 2 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660D_4017FEF5 | 2705 | 13525 | 2 (0)| 00:00:01 |
|* 10 | INDEX RANGE SCAN | T1_I1 | 1 | 17 | 0 (0)| 00:00:01 |
|* 11 | FILTER | | | | | |
| 12 | INDEX FAST FULL SCAN | T1_I1 | 54081 | 897K| 34 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(COUNT(*)>1)
10 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
11 - filter(NULL IS NOT NULL)

As you can see, we load a small (we hope) temporary table by doing an index walk with running count; we use that result set in a nested loop back into the index, and then we do a union all with a result set that we don’t attempt to create because the “null is not null” filter terminates that branch of the plan. All we have to do now is make Oracle run OUR query instead of its own.


Plan B – dbms_iiadvanced_rewrite


The package dbms_advanced_rewrite basically allows you to tell Oracle that when it sees a particular piece of SQL it should run some other piece of SQL. You do have to be a little careful to make it work, of course – the text should be “sufficiently similar”, and the number and type of values used and returned has to be the same. Here’s an example of setting up a “rewrite equivalence”:
begin
sys.dbms_advanced_rewrite.declare_rewrite_equivalence(
name => 'TEST_HINT',
source_stmt => 'select * from t1 where n1 = 15',
destination_stmt =>
'select /*+ index(v1.t1) */ * from v1 where n1 = 15',
validate => false,
rewrite_mode => 'general'
);
end;
/

In this example, whenever the statement “select * from t1 where n1 = 15″ reaches the database, it will execute the hinted SQL query against the view.
Following this example, all we have to do (and this is a technique that became avaiable in 10g) is let Oracle generate the SQL that’s giving us the bad plan, and set up an equivalence so that what actually runs is our version with the subquery factoring clause instead – except we run into another problem when we try to declare the equivalence:
ERROR at line 1:
ORA-30389: the source statement is not compatible with the destination statement
ORA-32034: unsupported use of WITH clause
ORA-06512: at "SYS.DBMS_ADVANCED_REWRITE", line 29
ORA-06512: at "SYS.DBMS_ADVANCED_REWRITE", line 185
ORA-06512: at line 11


Brilliant idea number 2 isn’t going to work – subquery factoring isn’t supported with advanced rewrite. So I guess we’re just going to have to wait for Oracle to improve the code.


Summary


After many years of believing that Oracle did something sensible when validating a primary key or unique constraint against an existing index, I finally discovered that it was doing something wierdly inefficient that invariably (probably) will result in the content of a very large index being dumped to the temporary tablespace twice as the validation takes place.
I tried to find a way of making the existing code avoid any large joins and aggregations by working out how to construct an SQL Plan Baseline for the query that would make it use a different join order and walk the index in order to produce an optimal nested loop approach. A limitation in the optimizer made the plan unreachable. (A variation on the plan that did an index fast full scan and hash aggregation followed by a nested loop is still viable, and could be a better strategy than the default.)
Because I couldn’t use an SQL Plan Baseline to get the plan I wanted, I looked at the possibility of a rewrite that would allow me to aggregate without using an temporary space, and it turned out that materializing a subquery factoring clause would work. The only way to impose this on Oracle, though, would be through the package dbms_advanced_rewrite – and it is a current limitation of that package that it won’t allow you to use subquery factoring in the rewrite process.
It is worth noting that the work I’ve done was triggered by a question relating to a table of 1.8 Billion (with a B) rows and the way that the default validation process needed more temporary space than was available. Although the general principle of what to do is going to change, the actual code you need would have to be tailored to the specific table and index every time tried to apply the method. In general it’s the sort of task which you don’t do unless you’ve got a very good reason. I went to the trouble of working out a few possibilities (a) because I was curious and (b) because I thought it might make an interesting little study for the blog.



DIGITAL JUICE

No comments:

Post a Comment

Thank's!