VizualBusiness: Running Total

Running Total:
Here’s a question on OTN from a SQL Server user that should prompt a few interesting ideas. Re-arranged and paraphrased it goes something like this:

In SQL Server I can write the following code:
DECLARE @Counter INT
SELECT @Counter = 0

UPDATE TempDB.dbo.TransactionDetail
SET @Counter = AccountRunningCount = @Counter + 1
FROM TempDB.dbo.TransactionDetail WITH (TABLOCKX)
What I want to do is more like this:
DECLARE @Total INT = 0
UPDATE StringOutput
set @Total = SumOfLength = @Total + ColLength
OPTION (MAXDOP 1)
How do I do something similar in Oracle ?

I suspect that there may be an error in the second piece of code if it is to run on SQL Server since it doesn’t have a “FROM StringOutput” to match the “FROM TempDB.dbo.TransactionDetail” that appears in the first piece – but perhaps it’s implied by the reference to StringOutput in the UPDATE clause.
To an Oracle user, the set clause is particularly strange – but this is a feature of how T-SQL (roughly the equivalent of PL/SQL) can work for SQL Server. This set clause allows each row to modify, and be modified by, a variable. So, for each row, this double assignment first sets the column sumoflength to the value of the variable total plus the value of the column collength, then it sets the variable total to this new value of sumoflength. The net effect is that the column sumoflength acquires a running total of collength as you read through the table. (I am basing my explanation on “reasonable guesswork” combined with the title of the thread which was: “Update Failed for Sum of previous row and current row”.
So how do you do this in Oracle ? The simple answer is “you shouldn’t, but if you do the strategy has to be completely different because the tools are different”.
The reason for the “shouldn’t” is that there is no implicit order in a select statement unless it includes an “order by” clause – so any thoughts of “previous” or “next” for data in a table are erroneous. Assume then that we define an order through a unique, non-null column – we would then look at the SQL Server code and wonder whether its internal implementation was effectively a row by row mechanism, or whether there was some cunning array based processing taking place (perhaps like the trick that PL/SQL uses to turn an implicit cursor for loop into array fetches of 100 rows at a time).
My first thoughts about the task (assuming we actually do want to do it) point me to analytic functions as a way of producing the data – but I fear I may have to use PL/SQL because of the strong flavour of row-based activity that the code suggests. Let’s start with the first stab at an analtyic approach:

create table t1
as
with generator as (
 select --+ materialize
  rownum id 
 from dual 
 connect by 
  level <= 1e4
)
select
 rownum   id,
 rownum   n1,
 cast(null as number(9)) running_tot
from
 generator v1
where
 rownum <= 1e3
;

alter table t1 add constraint t1_pk primary key(id);

-- collect stats

This gives us a primary key for ordering, a column to generate a running total for, and a column to store the running total – currently null. Let’s write a query that shows us the data we ultimately want to store in the table:

select
 id,
 n1,
-- running_tot,
 sum(n1) over (order by id) running_tot
from
 t1
;

        ID         N1 RUNNING_TOT
---------- ---------- -----------
         1          1           1
         2          2           3
         3          3           6
         4          4          10
         5          5          15
         6          6          21
         7          7          28
         8          8          36
         9          9          45
        10         10          55

Since we can calculate the right information in a select statement it would be nice if we could turn that statement into an updatable view; unfortunatley it doesn’t work.

update
 (
 select
  id,
  n1,
  running_tot,
  sum(n1) over (order by id) running_tot_calc
 from
  t1
 )
set
 running_tot = running_tot_calc
;

        (
        *
ERROR at line 2:
ORA-01732: data manipulation operation not legal on this view

That doesn’t seem entirely reasonable, but we can take the code and wrap it in pl/sql as a cursor for loop and (left as an exercise to the reader) we could even make the code more complex to do array updates rather than using the simplest code to do row by row updates:

declare
 cursor c1 is
  select
   id,
   n1,
   running_tot,
   sum(n1) over (order by id) running_tot_calc
  from
   t1
  for update of 
   running_tot
 ;
begin
 for r1 in c1 loop
  update t1 set running_tot = r1.running_tot_calc where current of c1;
 end loop;
 commit;
end;
/

For the small data set I had generated, this runs pretty quickly, and gets the right answer. I find it slightly surprising, though, that I’m allowed to use the “select for update”/”current of” approach when a straight update isn’t legal.
Of course we might consider going for the full, non-procedural, array update – and luckily we’ve got a primary key that will help us to generate an updatable join view, or a simple, update-only, merge.

merge
 into t1 od
using (
 select
  id,
  n1,
  running_tot,
  sum(n1) over (order by id) running_tot_calc
 from
  t1
 ) nd
on
 (nd.id = od.id)
when matched then
 update set od.running_tot = nd.running_tot_calc
;

I won’t show you the updatable join view version – because it didn’t work – I got the “non-key-preserved” error ORA-01779. I think the optimizer is having some trouble (or being a little too clever) with the analytic sum(). Again, with the small data set, this update was pretty quick. For larger data sets, the choice of general strategy is left to the reader.
I’d still be interested to know, though, whether the SQL Server code was internally array based or operates row by row.

DIGITAL JUICE

VizualBusiness

Saturday, October 27, 2012

Running Total

No comments:

Post a Comment