Saturday, July 28, 2012

Compression Units – 3

Compression Units – 3:
For those who have read the previous posting of how I engineered an Exadata disaster and want to reproduce it, here’s the script I used to generate the data.
create table t1
as
with generator as (
select --+ materialize
rownum id
from dual
connect by
level <= 1e4
)
select
rownum id,
lpad(rownum,10,'0') v1,
mod(rownum,10000) n1,
trunc(rownum/100) n2,
trunc(dbms_random.value(0,262144)) n_256K,
trunc(dbms_random.value(0,131072)) n_128K,
trunc(dbms_random.value(0,8192)) n_8k,
rpad(dbms_random.string('U',3),60) padding
from
generator v1,
generator v2
where
rownum <= 1048576
;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

create table t1_qh
compress for query high
as
select * from t1
;

create table t1_ah
compress for archive high
as
select * from t1
;

begin
dbms_stats.gather_table_stats(
ownname => user,
tabname =>'T1',
method_opt => 'for all columns size 1'
);

dbms_stats.gather_table_stats(
ownname => user,
tabname =>'T1_QH',
method_opt => 'for all columns size 1'
);

dbms_stats.gather_table_stats(
ownname => user,
tabname =>'T1_AH',
method_opt => 'for all columns size 1'
);
end;
/

create bitmap index t1_bi on t1(n_128K);
create bitmap index t1_qh_bi on t1_qh(n_128K);
create bitmap index t1_ah_bi on t1_ah(n_128K);

select
max(padding)
from
t1_ah
where
n_128k between 1000 and 1999
;


You’ll need about 5 GB of space to create all these objects – but that shouldn’t be too difficult on the typical Exadata system. As you can see, I’ve created the data to cover a variety of tests and experiments, including comparing the side effects of different levels of compression (or not).
When you run the query, you may find that the optimizer arithmetic picks the tablescan option automatically – the choice will depend basically on your setting for the db_file_multiblock_read_count and/or the settings for the various system stats (sreadtim, mreadtim, MBRC and CPUspeed, or the equivalent no-workload settings); so, to see the performance impact, you may have to hint the query. (It’s slightly worrying that when you set up a (datawarehouse) system on Exadata it might be a good idea – at least in the short term – to set the db_file_multiblock_read_count to 128 because you really do want Oracle to think that tablescans are, in general, a pretty good fast option – we’ve only just got ver persuading people that you shouldn’t set this parameter in any version of 10g or above !
After running the query twice on each table (to get all the data cached somewhere, if possible) the most recent timings I got were as follows for the indexed access path (the 10% difference between this test and the time I reported in my last test could possibly be explained by the fact that this machine had seen some serious volume testing while the other machine was so new that I had probably been using the outer edge of every disc):

  • No compression – 0.7 seconds.

  • Query high – 77.24 seconds

  • Archive high -  3022.83 seconds


The number of different ways you can test against just this set of data is quite surprising – so don’t feel you have to stop with just the one demonstration.



DIGITAL JUICE

No comments:

Post a Comment

Thank's!