Often, I need to convert from legacy dates to true dates. I convert mostly so I can easily do date arithmetic, but sometimes just for readability in ad hoc queries over old files. I'm a big fan of true dates and preach their usage when I get the chance, but there is some CPU overhead. This article is about improving efficiency in converting to true dates and reducing some of that overhead.
I know that discussing performance can start arguments that can become more heated and contentious that a political or religious debate. I don't want that to happen. So I'm just going to show you alternate code approaches and discuss how long each takes to run. Then you decide which the better approach is. Me, I'll go for the slightly shorter code, which also just happens to run significantly faster, but I'm getting ahead of myself.
One thing I've learned about legacy date fields is that occasionally they don't contain dates. Surprise! Trying to convert a date that is invalid causes a run-time check in RPG, usually around 2:00 a.m. Typically, good defensive programming would go something like this:
Use the TEST opcode to check if the legacy field is valid
If valid
convert legacy field to true date
else
logic to handle invalid data
endif
This is the kind of logic I was coding, using the TEST opcode to check that the date was valid. It turns out that there is a slightly shorter code structure that turns out to be faster.
I wrote short, CPU-intensive programs to run one million iterations of each code approach and compared the total CPU usage to get a relative difference in performance.
Alternative Approaches
Here's the code for DATCHK1, the one that uses the TEST opcode.
D DATCHK1 pr
d DATCHK1 pi
d NumDate s 6p 0 inz(080229)
d wkDate s d
d i s 10i 0
/FREE
for i = 1 to 1000000;
test(de) *YMD NumDate;
if not %error;
wkDate = %DATE(NumDate:*YMD);
else;
// Handle bad date
endif;
endfor;
*inlr = *on;
return;
/END-FREE
DATCHK1 takes the negative approach, checking for valid data before it performs the conversion. I ran it three times after hours on a lightly used development box at V5R3. In all three runs, it used 21 CPU seconds. Your mileage may vary, depending on your hardware.
The second program, DATCHK2 uses the MONITOR opcode.
D DATCHK2 pr
d DATCHK2 pi
d NumDate s 6p 0 inz(080229)
d wkDate s d
d i s 10i 0
/FREE
for i = 1 to 1000000;
monitor;
wkDate = %DATE(NumDate:*YMD);
on-error;
// Handle bad date
endmon;
endfor;
*inlr = *on;
return;
/END-FREE
DATCHKC2 takes the positive approach, assuming that the data is valid and immediately trying to convert it and trapping any failure. I ran DATCHK2 three times on the same development box. Two runs each used 12 CPU seconds and one used 11 CPU seconds.
Which Is Faster? And Why?
If we go with 12 seconds, DATCHK2 is about 43 percent faster than DATCHK1 ((21-12)/21*100). If you look at it the other way, using TEST takes 75 percent longer than using MONITOR ((21-12)/21*100). You may be able to work these numbers another way, but they clearly show that the second approach is significantly faster.
Why is the MONITOR approach faster? Or why is the TEST approach slower? I don't think it's poor coding on the part of the compiler programmers. I suspect, rather, that when you convert to a legacy date, the conversion routines insist on valid input and probably go through the same validity checking that the TEST opcode goes through. So when you use the TEST opcode approach, the validity checking is being done twice, and this validity checking is costly in terms of CPU time (but cheap in terms of happy, satisfied customers, not to mention frazzled developers who don't get called in at 2:00 a.m. with an exception due to an invalid date.)
Is It Worth It?
I have a colleague whose rule of thumb is "first make it right; then make it fast." Sometimes we developers overcomplicate programs, trying to tease out a couple more microseconds per record, when the number of records being processed is so low that we maybe save a CPU second a day. I'm not advocating sloppy, inefficient code, but we can sometimes get hung up on efficiency.
So why am I suggesting a differing code style that might save 10 CPU seconds per million iterations? It's a combination of several things:
- The MONITOR approach is one line shorter.
- The performance improvement is a nice bonus at no extra cost to complexity.
- I like the positive approach of assuming that the date is correct. Much as we may disparage legacy date fields, most of the time they do contain valid date values.
The MONITOR positive approach has other applications when editing and converting data, and it's well worth becoming familiar with it. For more information, check it out in the ILE RPG Language Reference manual.
LATEST COMMENTS
MC Press Online