In this next installment on SQL's Online Analytical Processing (OLAP) functions, you'll see how OLAP continues to provide SQL users the business functions we've enjoyed in RPG for so long.
In my previous OLAP article, I used the OLAP function ROW_NUMBER to provide the concept of "next record" that we take for granted in RPG's native access (contrast RPG's native ISAM access with SQL's relational access). In this article, I'll show you the magic of GROUPING SETS and how they can be used to provide the same functionality as control-level breaks.
Did You Say Control-Level Breaks?
Why yes, yes I did. You recall those lovely little L1 and L2 indicators, don't you? Control-level breaks (also known as level breaks or control breaks) have been around for a very, very long time. In fact, we can reach back nearly two decades into the MC Press archives to read an article on control breaks written by my favorite technical writer of all time, Ted Holt. And of course the concept reaches back decades before then. Control breaks are just an indicator to the programmer that a key field has changed. The simplest use of this concept is when accumulating totals; a change in a key field indicates that a line should be printed and an accumulator should be cleared.
This is an easy concept in RPG. In fact, judicious use of level breaks and blank-after field controls allowed RPG programmers to write reports that tabulated incoming data with very few lines of actual code. The mainline code simply accumulated each record into total fields, and then the level break indicators were used to write the data to a report. The very act of writing the report line cleared the accumulators and the program continued on. I refuse to replicate the code; as wonderful as the cycle was for its time, I don't wish to revisit the days of I-specs, O-specs, and fixed-format C-specs. But trust me when I say it didn't take a lot of programming to do this:
SALES REP CUSTOMER Quantity Amount
--------- -------- -------- --------
123 1001 200.00 2,464.00
123 1002 343.00 5,155.00
123 2001 1,500.00 780.00
123 2,043.00 8,399.00
555 1001 50.00 1,000.00
555 2001 150.00 1,805.00
555 200.00 2,805.00
2,243.00 11,204.00
The file processed is the order file, which has among its fields Sales Rep (salesrep), Customer (custnum), Quantity, and Amount. The program simply reads through the file in salesrep/custnum order and accumulates quantity and amount at the three levels of the report: salesrep/custnum, salesrep, and grand totals. It prints out a line at the total break for each level. A good RPG programmer knocks this out in maybe 30 lines of code, including F-, I-, C-, and O-specs.
GROUP BY Doesn't Quite Cut It
In SQL, it's not quite so simple. Yes, you can use GROUP BY to accumulate things in SQL, but it doesn't lend itself very well to interspersing totals and details, or to expressing totals at different levels. In practice, to do the simple list above, you have to do three separate selects with GROUP BY clauses: one for the sales rep and customer totals, one for the sales rep totals, and one for the grand totals. You need to insert dummy values into the customer and sales rep fields in the higher-level accumulations and then finally perform a union between those three. It's not always an easy task. In this case, it's doable, but the results are far from pretty and the code is not particularly intuitive. Here's the SQL:
with t1 as
(select salesrep, custnum, sum(quantity), sum(amount)
from salesord group by salesrep, custnum),
t2 as
(select salesrep, 999999 custnum, sum(quantity), sum(amount)
from salesord group by salesrep),
t3 as
(select 999 salesrep, 999999 custnum, sum(quantity),
sum(amount) from salesord)
select * from t1 union
select * from t2 union
select * from t3 order by salesrep, custnum
You'll see three common table expressions (CTEs): those are the sub-selects with identifiers T1, T2, and T3. T1 accumulates quantity and sales amount by salesrep and custnum, T2 accumulates by salesrep using a dummy custnum of 999999, and finally T3 accumulates the grand totals using dummy values of 999 for salesrep and 999999 for custnum. The last part of the statement then ties those three subselects together with a UNION and orders them. The result is functional, if not particularly pretty (for brevity's sake, I'm not going to bother with column headings for the rest of the article):
123 1001 200.00 2,464.00
123 1002 343.00 5,155.00
123 2001 1,500.00 780.00
123 999999 2,043.00 8,399.00
555 1001 50.00 1,000.00
555 2001 150.00 1,805.00
555 999999 200.00 2,805.00
999 999999 2,243.00 11,204.00
You end up with the right numbers, certainly, as long as you recognize the 999 and 999999 values as special values that indicate a total line. But I hope you'd agree that the SQL statement isn't exactly the most intuitive thing you've ever seen. Not to mention the problem you might have if you actually have a salesrep number 999!
Enter GROUPING SETS
And that brings us to the subject of this article, the concept of grouping sets (be careful with this link; the IBM page goes into detail not only on grouping sets but also on the related but somewhat more complex and in my opinion less flexible CUBE and ROLLUP functions). Grouping sets perform for SQL the same function that level breaks and accumulators perform for our older RPG programs, and they do it in a very intuitive fashion. The SQL statement is quite simple:
select salesrep, custnum, sum(quantity), sum(amount) from salesord
group by grouping sets ( (salesrep, custnum), (salesrep), () )
order by salesrep, custnum
Notice that the statement requires no CTEs and no dummy values. Summing is handled by the GROUPING SETS clause, which specifies that you want totals at three different levels: salesrep and custnum, salesrep only, and then grand totals, indicated by the empty set "()". Here's what you get:
123 1001 200.00 2,464.00
123 1002 343.00 5,155.00
123 2001 1,500.00 780.00
123 - 2,043.00 8,399.00
555 1001 50.00 1,000.00
555 2001 150.00 1,805.00
555 - 200.00 2,805.00
- - 2,243.00 11,204.00
Nice! The dashes indicate null values; you can check for the nulls in your program or you can cast them to a special value using IFNULL in your select statement. Either way, the SQL code is very simple. More important to me is the fact that it's really easy to change the grouping. Here's how I run the same query, grouping by salesrep and item instead of salesrep and custnum:
select custnum, item, sum(quantity), sum(amount) from salesord
group by grouping sets ( (custnum, item), (custnum), () )
order by custnum, item
1001 ABC123 200.00 2,464.00
1001 DEF333 50.00 1,000.00
1001 - 250.00 3,464.00
1002 ABC123 220.00 2,695.00
1002 DEF333 123.00 2,460.00
1002 - 343.00 5,155.00
2001 ABC123 150.00 1,805.00
2001 GHI987 1,500.00 780.00
2001 - 1,650.00 2,585.00
- - 2,243.00 11,204.00
Boom! The report is reordered. And you know the totals are correct because the grand totals match the grand totals from the other report. I don't even want to go through the changes required for the other report. They're not horrible by any means, but in my opinion they're not nearly as intuitive as the simple changes required here. So, the next time you need to do some accumulation in your SQL, please take a look at the GRUOPING SETS clause and see how it can help you. I've got more OLAP functions in the works, so stay tuned!
LATEST COMMENTS
MC Press Online