Actually it’s probably not the NOT IN that’s nasty, it’s the thing you get if you don’t use NOT IN that’s more likely to be nasty. Just another odd little quirk of the optimizer, which I’ll demonstrate with a simple example (running under 11.2.0.3 in this case):
create table t1 as select * from all_objects where rownum <= 10000 ; update t1 set object_type = null where object_type = 'TABLE' ; commit; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt => 'for all columns size 1' ); end; ; select sample_size, num_nulls, num_distinct, histogram from user_tab_columns where table_name = 'T1' and column_name = 'OBJECT_TYPE' ;This code is going to give you 10,000 rows but the number of distinct values for object_type and the number of nulls will depend on the version and options installed; however you should get about 15 distinct values for object_type and about 1,000 nulls in that column.
If you want to run a query to count the rows where the object_type is not one of: ‘INDEX’,'SYNONYM’, or ‘VIEW’, there are two obvious ways of writing it, so let’s put them into a single query with a UNION ALL, and see what Oracle predicts as the cardinality (cut-n-pasted from an SQL*Plus session):
SQL> set autotrace traceonly explain SQL> select 2 count(*) 3 from t1 4 where object_type != 'INDEX' 5 and object_type != 'SYNONYM' 6 and object_type != 'VIEW' 7 union all 8 select 9 count(*) 10 from t1 11 where object_type not in ('INDEX', 'SYNONYM', 'VIEW') 12 ; Execution Plan ---------------------------------------------------------- Plan hash value: 575959041 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 12 | 38 (53)| 00:00:01 | | 1 | UNION-ALL | | | | | | | 2 | SORT AGGREGATE | | 1 | 6 | | | |* 3 | TABLE ACCESS FULL| T1 | 5903 | 35418 | 19 (6)| 00:00:01 | | 4 | SORT AGGREGATE | | 1 | 6 | | | |* 5 | TABLE ACCESS FULL| T1 | 7308 | 43848 | 19 (6)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("OBJECT_TYPE"<>'INDEX' AND "OBJECT_TYPE"<>'SYNONYM' AND "OBJECT_TYPE"<>'VIEW') 5 - filter("OBJECT_TYPE"<>'INDEX' AND "OBJECT_TYPE"<>'SYNONYM' AND "OBJECT_TYPE"<>'VIEW')Look at operations 3 and 5: the way you write the query makes a significant difference to the cardinality predicted by the optimizer – despite the fact that the NOT IN predicate is transformed internally so that the predicate sections of the two parts of the UNION ALL are exactly the same.
If you’re wondering, the NOT IN option is the one that does the rational arithmetic – it has calculated the number of rows where object_type is not null, then allowed for filtering out 3/15ths of those rows because the query rejects 3 rows out of the 15 distinct values reported by the statistics. The list of inequalities has applied some arithmetic that tries to allow for the nulls three times – the more values you reject the worse the estimate gets.
It’s quite likely that you won’t notice the effect in many cases – but if you run queries against a column with a large percentage of nulls, then the differences can be very large and knock-on effects of this error could be dramatic.
DIGITAL JUICE
No comments:
Post a Comment
Thank's!