Friday, September 14, 2012

Histogram Generation

Histogram Generation:
This really could be published in the Errata and Addenda of “Cost Based Oracle – Fundamentals”, but it’s more convenient to publish the whole thing here and just add a link to the Errata pages.
In chapter 7, on page 156, I gave an example of the type of SQL that Oracle runs (in the dbms_stats package) to generate a histogram. A sample of the code, and the plan from the 9.2.0.8 tkprof output, is listed below:

select
min(minbkt),maxbkt,
substrb(dump(min(val),16,0,32),1,120) minval,
substrb(dump(max(val),16,0,32),1,120) maxval,
sum(rep) sumrep, sum(repsq) sumrepsq,
max(rep) maxrep, count(*) bktndv,
sum(case when rep=1 then 1 else 0 end) unqrep
from
(
select
val,min(bkt) minbkt, max(bkt) maxbkt,
count(val) rep, count(val) *count(val) repsq
from (
select
/*+
noparallel(t) noparallel_index(t)
cursor_sharing_exact dynamic_sampling(0)
no_monitoring
*/
"LN100" val,
ntile(200) over (order by "LN100") bkt
from
sys.ora_temp_1_ds_616 t
where
"LN100" is not null
)
group by val
)
group by
maxbkt
order by
maxbkt
;

Rows Row Source Operation
------- ---------------------------------------------------
170 SORT GROUP BY
825 VIEW
825 SORT GROUP BY
200028 VIEW
200028 WINDOW SORT
200028 TABLE ACCESS FULL ORA_TEMP_1_DS_616



The code in this case references a table called ora_temp_1_DS_616, which is a temporary table dynamically created by the package to hold a sample of just the columns that I was interested in. I had specified a histogram of 200 buckets, and you can see the ntile(200) analytic function in the code that Oracle has used to generate this – and “window sort”  in the execution plan that supports it; it’s a window sort of the entire sampled data set which could be quite large and, as we’ve seen elsewhere, could be very resource-intensive.
In a demonstratin of how important it is (and how nearly impossible it is) to keep revisiting the things you think you know, I’ve recently discovered – while constructing a demonstration of a problem in 11.2.0.3 – that the implementation had changed by  10.2.0.3 (and it may have changed even earlier, of course – but that’s the earliest version of 10g I have acces to). Here’s the generated code for the same data set, with plan, from 10.2.0.3:

select
substrb(dump(val,16,0,32),1,120) ep, cnt
from
(
select
/*+
no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl
dynamic_sampling(0) no_monitoring
*/
"LN100" val, count(*) cnt
from
sys.ora_temp_1_ds_1470 t
where
"LN100" is not null
group by "LN100"
)
order by
val

Rows Row Source Operation
------- ---------------------------------------------------
824 SORT GROUP BY (cr=499 pr=0 pw=0 time=42484 us)
199451 TABLE ACCESS FULL ORA_TEMP_1_DS_1470 (cr=499 pr=0 pw=0 time=199490 us)



As you can see it doesn’t have any functionality built into it that could possibly provide the 200 buckets that I finally want to see. You’ll also notice that the sample size is slightly different from the 9i sample size – that’s one of the threats of sampled histogram creation, of course, you could get a slightly (but dangerously) different histogram every time you call dbms_stats even if the data hasn’t changed.
Based on the differences in the query, I think we can safely assume that Oracle is using some procedural code to count its way through the results from this query (ordered by column value) to construct the histogram details that we are used to seeing. I think it’s worth noting that this demonstrates the fact that using a simpler SQL statement with some surrounding PL/SQL can sometimes be more efficient than writing a devilishly clever piece of SQL that gets you the answer directly.



DIGITAL JUICE

No comments:

Post a Comment

Thank's!