Wednesday, September 12, 2012

FBI Delete

FBI Delete:
A recent post on Oracle-l complained about an oddity when deleting through a function-based index.

I have a function based index but the CBO is not using it. The DML that I expect to have a plan with index range scan is doing a FTS. Its a simple DML that deletes 1000 rows at a time in a loop and is based on the column on which the FBI is created.

Although execution plans are mentioned, we don’t get to see the statement or the plan – and it’s always possible that there will be some clue in the (full) plan that tells us something about the code that the OP has forgotten to mention. However, function-based indexes have a little history of not doing quite what you expect, so I thought I’d take a quick look at the problem, starting with the simplest possible step – do function-based indexes and “normal” b-tree indexes behave differently on a delete. Here’s the data set I created for my test:

create table t1
nologging
as
with generator as (
select --+ materialize
rownum id
from dual
connect by
level <= 1e4
)
select
rownum id,
trunc(dbms_random.value(1,1e6)) n1,
lpad(rownum,10,'0') small_vc,
rpad('x',100) padding
from
generator v1,
generator v2
where
rownum <= 1e6 ;

create index t1_f1 on t1(n1+1) nologging;
create index t1_i1 on t1(n1) nologging;
execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')



I ran this on 11.2.0.3 and it took about 40 seconds to create the data, indexes and stats. Now all I have to do is two “identical” delete statements and check the execution plans:

set autotrace traceonly explain

delete from t1 where n1 <= 100000 and rownum < 1000;

delete /*+ index(t1) */ from t1 where n1 + 1 <= 100001 and rownum < 1000;

set autotrace off


There two statements would delete the same 999 rows from the table (assuming they both do ascending range scans of the indexes). I’ve hinted the query for the function-based index because the OP said that Oracle always wanted to do a tablescan, so I’ve eliminated that option to save myself a few seconds. Here’s the plan for the first statement (normal index):

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 999 | 4995 | 229 (2)| 00:00:02 |
| 1 | DELETE | T1 | | | | |
|* 2 | COUNT STOPKEY | | | | | |
|* 3 | INDEX RANGE SCAN| T1_I1 | 100K| 488K| 229 (2)| 00:00:02 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM<1000)
3 - access("N1"<=100000)


This looked pretty much as I had expected. The cost of deleting from a table is basically calculated as the cost of selecting the rowids of the rows to be deleted. The 229 cost in this query is roughly 10% of the number of leaf blocks in the index (the optimizer has calculated the cost of the index range scan for rows where n1 <= 100000 although, strangely, ithasn’t allowed for the fact that it will be able to stop one tenth of the way through that range scan). How does this compare with the plan for the corresponding delete through the function-based index:

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 999 | 9990 | 100K (1)| 00:10:03 |
| 1 | DELETE | T1 | | | | |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 976K| 100K (1)| 00:10:03 |
|* 4 | INDEX RANGE SCAN | T1_F1 | 100K| | 229 (2)| 00:00:02 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM<1000)
4 - access("N1"+1<=100001)


Spot the critical difference: the plan includes a redundant table access operation. There is no need to visit the table to find rowids but Oracle includes the operation and then costs for it; and since the n1 data was generated randomly the index has a very large clustering factor and the optimizer assumes we will have to visit 100,000 table blocks to get the 100,000 rows (and that same oddity of the stopkey appearing late in the calculation appears again.)
So here’s one possible source of the OP’s problem – there’s a bug in the optimizer’s handling of function-based indexes with delete. For comparitive purposes, of course, I had to check the cost of selecting just the rowids through the function-based index:

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 999 | 16983 | 5 (0)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
|* 2 | INDEX RANGE SCAN| T1_F1 | 1000 | 17000 | 5 (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<1000)
2 - access("N1"+1<=100001)


This produces a clue to the wider problem. Not only have we lost the redundant table access operator, the row estimate is down to 1,000 rows – the “rownum” predicate has been applied very early in the plan to produce a cost of 5. (Note – even the delete using the normal index failed to push the rownum predicate down into index range scan, making that cost rather higher than you might expect.) Whether by accident or design, it looks as if the optimizer hasn’t used the “fkr – first K rows” optimisation change to the delete statement that it applies to the select statement.
Once I’d got started it was a little hard to stop: here’s what I saw when I dropped the normal b-tree index and re-created it as the descending index (n1 desc) – admittedly not a sensible idea on a production system, but I was curious about costing:

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 999 | 4995 | 2402 (2)| 00:00:15 |
| 1 | DELETE | T1 | | | | |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID| T1 | 97718 | 477K| 2402 (2)| 00:00:15 |
|* 4 | INDEX RANGE SCAN | T1_D1 | 97718 | | 2402 (2)| 00:00:15 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM<1000) 4 - access(SYS_OP_DESCEND("N1")>=HEXTORAW('3CF4FF') )
filter(SYS_OP_UNDESCEND(SYS_OP_DESCEND("N1"))<=100000)


Again we get the redundant visit to the table – but this time it isn’t costed. However, the cost of the index range scan is 2,402 – which is basically the cost of scanning the WHOLE index (which consisted of 2,356 leaf blocks) and that looks like another bug.
Summary: Although the cost of a delete normally seems to be the cost of selecting the rowids of the rows to be deleted, I’ve highlighted some odd behaviour in the optimizer’s arithmetic that shows this assumption going wrong in the presence of (a) function-based indexes and/or (b) rownum predicates.
Footnote: I haven’t looked at any 10053 trace files to see if I can work out exactly what Oracle is doing.
Warning: These results are NOT CONSISTENT across different versions of Oracle – 10.2.0.3, for example, showed the redundant table access in the execution plan but didn’t cost for it, so the delete used the index access path without needing a hint.

DIGITAL JUICE

No comments:

Post a Comment

Thank's!