When I introduced Connor McDonald’s blog a few days ago, it was because we had exchanged a couple of email messages (through the Oak Table Network) about how to minimise the resource consumption when copying a load of data from one IOT to another of the same structure. His problem was the way in which the obvious way of copying the data resulted in a massive sort even though, in principle, it should not have been necessary to sort anything since the data could have been extracted in order by walking the existing IOT.
As a suggestion I referenced a comment I had made in the Addenda to Practical Oracle 8i about 12 years ago when I had first solved the problem of loading an IOT with minimal logging and no sorting. At the time I had been loading data from a sorted file into an empty table that was then going to be exchnaged into a partitioned IOT – but it crossed my mind that load from a flat file and loading from a UNIX pipe were pretty much the same thing, so perhaps Connor could workaround his problem by making one session spool to a pipe while another session was reading it. In the end, he simply created a massive temporary tablespace, but I thought I’d modify a test script I wrote a few years ago to demonstrate my idea – and here it is:
rem rem create an empty IOT rem Note the columns used for the PK rem create table iot_load( id number, name varchar2(32), padding varchar2(100), constraint iot_pk primary key(name, id) ) organization index ; rem rem Create a load file rem But since we are running UNIX use the mknod command to rem create a pipe before spooling to that pipe. rem set pagesize 0 set feedback off set newpage 0 set trimspool on set linesize 180 set timing off set termout off column object_name format a38 host mknod /tmp/iot_load.dat p spool /tmp/iot_load.dat select r, object_name, padding from ( select rownum r, object_name, rpad('x',100,'x') padding from all_Objects where rownum <= 10000 ) order by object_name, r ; spool off
This script creates a target IOT, and uses a simple select statement (you get a clue about how old this code is by the fact that I’ve referenced all_objects, and not a “connect by” with dual) to generate a set of date that is sorted in an order that is a suitable match for the definition of the IOT. In Connor’s case he could have used a simple ‘select from source IOT order by primary key’. I’ve set up a few SQL*Plus environment detals to make the output completely flat and undecorated. A key point, though, is that I’m spooling to a file called /tmp/iot_load.dat and that file has been pre-created as a UNIX pipe. When you run this script the select statement will hang almost immediately because the pipe will become full, at which point you need another process to start emptying it. So from another UNIX session run the following (changing the userid and password as necessary):
sqlldr userid=test_user/test control=iot_load data=/tmp/iot_load.dat
Here’s the content of the control file iot_load.ctl
OPTIONS (direct = true) UNRECOVERABLE LOAD DATA TRUNCATE INTO table iot_load sorted indexes (iot_pk) ( id position(01:10) char, name position(12:49) char, padding position(51:150) char )
We use direct path load (which means virtually no undo) in unrecoverable mode (which means virtually no redo). But the unrecoverable bit only works because we start by truncating the IOT, and promising that the incoming data is sorted in order of the IOT’s primary key index. And since the data is appearing in order, Oracle doesn’t have to sort it before inserting it. Net result: no undo, no redo (apart from metadata) and no sorting – i.e. minimum overhead for loading the IOT – we simply walk the source in order and copy it into the target.
As this session reads from the pipe, the first session can resume writing into the pipe, so the data flows from source to target through a small piece of shared memory (techically a bit of the file-system buffer, I suppose) until the source session closes the pipe with its “spool off”.
DIGITAL JUICE
No comments:
Post a Comment
Thank's!