Those who forget history are doomed to repeat it, and in software, doomed to reinvent it.
READE and READPE are powerful RPG opcodes whose use seems to be dwindling. Part of that may just be that people don't remember them, so I thought I'd revisit them in this article.
Processing a Loop
The situation is pretty simple. You want to process a set of records based on a key. This could be listing all the open order requirements for a given line or all the shipments from a specific warehouse on a given date. At first glance, this is often the domain of SQL and its unmatched ability to do ad hoc queries. To do this with native I/O, you must have built a logical file in the correct sequence; otherwise, you have to read every record in the file. But really, the same problem exists with SQL. An ad hoc query, especially over a large database, can be very slow if there is no corresponding INDEX (the SQL equivalent of a logical file).
So while a truly ad hoc query (one without a pre-built access path) is very nice in one-off situations where flexibility matters a lot more than performance, in the case of a task that is performed regularly, you will need an access path whether you're writing SQL or native I/O.
So assuming an access path is in place, then what really is the benefit of using READE or READPE over an SQL statement...or over a simple READ for that matter?
READ(P)E vs. READ
The problem when using a read is that you have to do the key checking yourself. In the case of a simple key, that's not so bad. Let's take a file called ITEMS with a single key field named ICLASS. The key you want to process is in MYCLASS. Your code looks like this:
setll (ICLASS) ITEMS;
read ITEMS;
dow (not %eof(ITEMS) and ICLASS = MYCLASS);
exsr process;
read ITEMS;
enddo;
First, you use the SETLL command to position your file to the first record that matches your condition. Then you read through the file, checking each record to be sure it still fulfills that condition. This is very easy with one key. But it starts to get ugly with more than one key, and it ends up looking like this:
setll (MYFROMWHS: MYTOWHS: MYITEM: MYDATE) SHIPMENTS;
read SHIPMENT;
dow (not %eof(SHIPMENTS)
and SWHSFROM = MYFROMWHS
and SWHSTO = MYTOWHS
and SITEM = MYITEM
and SSHIPDATE = MYDATE);
exsr process;
read SHIPMENT;
enddo;
As you can see, the code can start to add up. However, with READE it's easier to specify the keys:
setll (MYFROMWHS: MYTOWHS: MYITEM: MYDATE) SHIPMENTS;
reade (MYFROMWHS: MYTOWHS: MYITEM: MYDATE) SHIPMENTS;
dow (not %eof(SHIPMENTS));
exsr process;
reade (MYFROMWHS: MYTOWHS: MYITEM: MYDATE) SHIPMENTS;
enddo;
Note that now you only have to check for end-of-file using the %eof BIF. That's because READE will return end-of-file when it hits a record whose keys don't match the key fields specified on the instruction.
And if you find that to be too much typing, you can hearken back to an even older technique, the key list. Although its use is frowned upon as being less "self-documenting" than the code above, it still has its place. A key list is a predefined set of fields used as a key:
C SHIPKEY KLIST
C KFLD MYFROMWHS
C KFLD MYTOWHS
C KFLD MYITEM
C KFLD MYDATE
Then you simply use the key list:
setll SHIPKEY SHIPMENTS;
reade SHIPKEY SHIPMENTS;
dow (not %eof(SHIPMENTS));
exsr process;
reade SHIPKEY SHIPMENTS;
enddo;
On the positive side, it's a lot less code to type, and that can be an argument itself when you have to use the same key fields in many places in a program. Also, by specifying the keys in one place, you don't have the potential issue of accidentally specifying different keys at two points in the program.
On the negative side, if you want to understand what the I/O opcodes are doing, you need to refer to the key list; hopping back and forth between the actual program logic and the key list definition can be counter-productive. It's this lack of localization that has led to the KLIST opcode being generally deprecated by the programming community.
All of these same issues apply to the READPE opcode; the only difference is that you use the SETGT opcode rather than SETLL to position the file and then execute READPE to read through the file in reverse sequence. (And remember, reverse sequence is not necessarily descending order; if the access path specifies descending order on a key field, then READPE will read that field in ascending sequence!)
READ(P)E vs. SQL
This is a tougher issue to present. In most cases where you would use a partial key, an SQL cursor can function just as well. For example, if you're simply using a set of fields to limit the records processed in a given loop, this works just as well:
exec sql declare cursor c1 as
select * from SHIPMENTS where
SWHSFROM = :MYFROMWHS and
SWHSTO = :MYTOWHS and
SITEM = :MYITEM and
SSHIPDATE = :MYDATE
order by SWHSFROM, SWHSTO, SITEM, SSHIPDATE;
exec sql open c1;
fetch next from c1 into :SHIPMENTDS;
dow SQLCOD = 0;
exsr process;
fetch next from c1 into :SHIPMENTDS;
enddo;
The proponents of self-documenting code point out that this code is pretty self-contained; all the information is there, including the order of the records and the selection criteria, which is in contrast to the native I/O environment, where the key information is specified in the logical view and the selection criteria is embedded in the program logic itself. You have to decide for yourself whether that's a better way to program. I'm an old RPG dinosaur, and I grew up with ISAM access, so the fact that the file defines my access is fine by me.
Also note that in both situations I've left out error-checking. I'm old-school, and I figure that if I'm getting database I/O errors, it's either because of a bad application design or an honest-to-goodness hardware problem, and in either case, a hard halt is fine with me. If you disagree with that, let me know, and I'll address it in another article. But for now, let's try to keep things simple.
Now, a savvy SQL proponent might inquire as to what the "process" routine is actually doing, and if the processing is simple enough, you might be able to circumvent a lot of this by using a set-based SQL statement rather than a cursor. For example, if all you were doing was totaling quantity, you'd do something like this:
exec sql select sum(SQUANTITY) into :MYTOTQTY
from SHIPMENTS where
SWHSFROM = :MYFROMWHS and
SWHSTO = :MYTOWHS and
SITEM = :MYITEM and
SSHIPDATE = :MYDATE;
You'd be done. Not only that, but chances are this would execute much faster than doing it via native I/O. The performance often gets better as you add other files joined in by foreign keys--for example, to calculate unit-of-measure conversions or whatnot--because the I/O tends to stay in the SQL engine and is executed at a lower level than the compiled program.
So why wouldn't you use SQL all the time? Well, if the processing is too complex or requires business logic that isn't easily expressed in SQL, you may choose to not use SQL. Or, if the processing involves doing things other than computations, such as writing to other files or calling other programs, then an aggregate SQL statement is usually not the best solution.
But that still leaves the SQL cursor approach. Where does this fail? The most typical case where the SQL cursor fails is when you need to break out of the subset. By that, I mean situations where the user wants a set of records based on some criteria but then wants records just outside that boundary. A typical example is the above case, where the last key field is the ship date. The user may want to see a list of records on May 23, but after looking at those records, may want to "page back" and see the data just prior to May 23. With an SQL cursor, the selection criteria identify a hard selection; the cursor does not contain records outside that boundary. In this case, the cursor cannot be positioned to records prior to the selection date. The only recourse is to actually create another cursor, and that can be expensive.
How expensive? Expensive enough to make it worthwhile to use traditional indexed access rather than SQL? That depends on the cursor, and in many cases the two techniques may yield very similar results, at which point you have a decision to make. And when comparing two different programming techniques that yield the same results, the intangibles sometimes play a larger part than the actual mechanics. For example, when the two approaches are otherwise equal, I prefer the simplicity of indexed access over the wordiness of SQL. I also prefer RPG over COBOL; that's just the way I am.
But before you get to the intangibles, you need to understand the mechanics, and this article has shown you how the approaches compare.
LATEST COMMENTS
MC Press Online