Back in a previous life, I was the Manager of Architecture for a large packaged software company. That period saw the emergence of embedded SQL, and software development on the IBM midrange changed drastically for reasons I'll touch on a little later.
As a software vendor, we also had to teach our clients how to change our code. We established a number of techniques designed to allow customers to modify programs to meet their specific needs while minimizing the work involved with upgrading to future releases of the software.
Types of Change
You can break changes down into two major areas: program changes and database changes. Each of those can then be further subdivided (though a bit less concretely) into modifications and enhancements.
Changing the Program
It's almost impossible to come up with a standard technique for modifying programs, since so much depends on the programs themselves. Techniques for fixing modern ILE applications are much different from those required to fix older RPG III code. The techniques are so different, in fact, that many companies have a policy in which any program that needs to be changed is first modernized to RPG IV using a tool such as CVTILERPG or one of the commercial equivalents.
Packaged applications introduce an additional twist: ongoing support. If you were to convert the program, chances are you'd pretty much end any chance of support from the vendor. Of course, if you're working on a version of packaged software that's still written in RPG III, there are larger issues, but I'm definitely not going to step into that particular mine field today.
About the only thing I can say regarding modifying packaged code is that you hope that the vendor has supplied exit points for commonly modified pieces of logic. This sort of technique is prevalent in operating systems including i5/OS and has become increasingly available in many utilities, such as change management systems. The concept of an exit point is that a program can be specified so that, either before or after a certain predefined event occurs, that program is called. Typically, the program will be called with enough information so that the user can either perform some additional functions or even change the base functionality of the utility itself. For example, a change management vendor may give you an exit point when an object is moved from one environment to another. That exit point might allow you to compile the object yourself or change the parameters that the tool uses to perform the compile.
It's not unreasonable to have the same sort of feature in an application suite. For example, I would expect an order-entry application to provide exit points for things like pricing and maybe even inventory management. But if the vendor does not provide that sort of exit point strategy, then you're looking at modifying existing code, and that's always a tricky path.
Changing the Database
Let's take a look at some examples on the database side. One of the most basic changes to a package is extending the item number, making it bigger. Note, by the way, that this is one of the areas in where SQL databases can have a perceived advantage. Not to sidetrack too much, but if you've defined all your fields as varying character fields, changing their size becomes purely a programming issue, not a database issue. But I digress, and you will still run into situations where you need to change the attributes of an existing field. A better example is changing from a five-digit (or even nine-digit) numeric ZIP code to an international ZIP code, which requires alpha characters. This falls squarely into the modification category, and it's one of the most difficult of all changes.
On the other end of the spectrum are files that are complete additions, used for external routines. Perhaps you need your own pricing routine or maybe a warehouse control system that your package doesn't provide; you would add your own master files for those things, the only relationship to the original database being the key fields used to access the data. This sort of operation is definitely an enhancement to the database and is relatively easy compared to other changes. I also recommend that this sort of change be done using DDL rather than DDS if possible, because the benefits of DDL-described files outweigh nearly all shortcomings.
In the middle, you may have to add a field to an existing file. One of the great changes in my career was when the payroll withholding tax rates were split. If I remember correctly, FICA and FUTA were separated, and suddenly we needed new fields for the FUTA rate and the SUTA credit cutoff and all kinds of things. These additions are a common occurrence, and there are two distinct schools of thought on how to handle them, which I'll cover momentarily.
Techniques for Managing Change
As I said, adding a new file is a comparatively painless task, so I won't address it. Changing a field is very complicated, so I'm going to put it aside for a moment and deal instead with adding a field to an existing file. I mentioned in the previous section that two techniques are prevalent today. Those techniques are modification and extension, and I'll explain each one.
Adding New Fields to an Existing File
The first and most straightforward option is to modify the existing file. In so doing, you simply add the field to the existing file. Even this, though, isn't quite as simple as it seems. Two distinct sub-techniques exist, and those are philosophically very different. One is to add the field anywhere in the record (typically trying to place it in proximity with related fields) and then recompile all affected programs. That's because the way the operating system works is that if the program is compiled over one version of a file and then tries to open a different version (for example, one with a new field added), the operating system will object via a fatal error message (CPF4131, one all old dinosaurs like me know well).
The idea of a mass recompile is sometimes a daunting one, and that's why System i software providers historically have tried to avoid database changes when possible. But the file version concept has advantages as well. Despite the pain of a database change, there is some real benefit in the concept of a "level check," in which the operating system halts if a program opens a file with a different version than it expected, since this usually signals a real issue in your environment. The error is especially important in situations where you might have multiple versions of an application running at the same time with slightly different databases. Making sure that your program uses the version of the database that it expects is a very important part of application management.
In these days of high availability, the idea of recompiling programs has fallen out of favor. While I'm not sure I am entirely comfortable with that direction, the idea of avoiding level checks is not new; even before this change in preference occurred, we had a technique to get around the level-check issue. It was widely discouraged but still occasionally used: LVLCHK(*NO). You can specify the LVLCHK(*NO) keyword for a file—either on a compile or on an override—to cause the operating system to skip the level check.
This gave rise to the second technique for adding fields. In order to avoid the problems associated with having to basically shut down production while installing a new file and all of the associated programs, some clever programmers (and you know what I think of "clever" programming) realized that you could simply stick the fields on the very end of the record and any program that didn't need the new fields could specify LVLCHK(*NO) and would not have to change.
Clearly, file maintenance programs would have to address the new fields (if indeed they were user-maintainable), and you might also need to run a conversion program to update the new fields, but in general, the impact on your existing system was drastically reduced.
Using Extension Files
The other option, which was the one we actively recommended to our clients, is to add new extension files. At SSA, we called these "X files" (long before the Scully/Mulder phenomenon) because typically we told clients to take our existing master file names and append an "X" to them. For example, if you needed a new field for your item master records, we told you to take the name of our item master file (IIM) and add an X, thereby giving you IIMX. The IIMX file would have the same key fields as the IIM file and would contain just those keys and the extra fields you needed. (Technically, the file would typically contain some additional fields, such as the record ID that we used to soft-delete records, but those database management field are somewhat outside the scope of this particular discussion.)
Programming changes were straightforward. In the simplest case, whenever you had a program that needed the new fields, you added a CHAIN to the IIMX right after you retrieved the IIM record. Some people got more creative and added things like JOIN logicals to automatically retrieve both records when needed. And of course your maintenance program would need to be updated to handle the new file, and if these were transaction-oriented fields, your transaction processing programs would be modified to update them.
But programs that didn't need these fields wouldn't need to be changed at all; they'd continue on using the original master file without a hiccup.
Back to Field Modifications
When I started this section, I skipped over field modifications. I wanted to touch on the two techniques above first, because understanding those makes it a little easier to deal with the field modifications issue. There are really two ways to modify a field: you can actually modify the original field in the file, or you can leave the original field and add a new one. If you choose the former approach, you will have to recompile every program that uses the file. If you choose the latter approach, though, you can minimize the impact by using one of the two field-addition techniques: LVLCHK(*NO) or extension files.
By adding a second field, programs that don't actually use the modified field can be left in place without changes. Unfortunately, fields that require change are often common fields that the user sees (descriptions, item numbers, and so on), so they tend to have more impact and be used in more places, but you can still minimize the change a little bit. Big warning, though: If you have to change a key field, all bets are off, and it usually is easiest to simply bite the bullet and make the systemic changes required. Another day, we may want to talk about the idea of "keyless" databases, in which files are keyed by identity fields and key fields aren't really key fields, but that's a different issue for a different day and is definitely not something you're going to do to an existing package.
How SQL Changes the Picture
SQL has a lot of plusses and a few minuses when it comes to database changes. In general, if all your database access is through completely field-driven SQL statements (that is, you never use "SELECT *" in your programs), then modifications to the database are much easier.
You can, for example, perform an ALTER TABLE command to add a field, and not a single existing program needs to be recompiled. You should be 100 percent clear that using the "SELECT *" syntax nullifies that capability; that's why many SQL experts insist that the syntax never be used. A file with dozens or even hundreds of fields requires some very long SQL statements, but that's a different issue.
All is not sweetness and light, however. Modifying fields is still problematic. Although it's relatively easy to change the length of a field, it's pretty much impossible to change the type. And the lack of a level check might cause a program that really needed a change to slip by under the radar, but that's really more of a procedural issue than an architectural one. In general, when your database is accessed completely through SQL, your changes are lessened.
If you read anything I write, you know my feeling about pure SQL access. I maintain that native I/O outperforms SQL for enough tasks that you can't replace everything with SQL. (That despite the fact that I've become convinced that there's almost no reason not to use DDL to describe your files.) However, we're talking about modifying an existing package, and in that case, you really don't have a lot of choice over the matter.
My Preferences
If you have a well-written pure SQL environment, I would almost always go with changing the file in place. Adds would be a simple ALTER TABLE command. I would probably do a field modification in four steps: add the new field with a new name, convert the old field to the new field, drop the old field, and then rename the new field. In any case, make sure you have the SQL statements stored somewhere so that you can re-execute them when you need to upgrade to a new version of the package.
In any situations where you have files that are accessed through native I/O, even those that are also accessed via SQL, I prefer the extension file approach for field additions. Add an extension file and, for SQL statements, simply add a JOIN clause to those programs that need the new fields. Simple and effective.
For field modifications on existing natively accessed files, I don't have a pat answer between modification in place and adding a second field. It will really depend on your business circumstances. But if I had to choose one over the other, I think that for a field change, I would go with the change in place. That's simply because it would force me to recompile all my native access programs, and that in itself might help me uncover errors that might not show up until later. But it's my experience that any field changes are going to be painful whether you're talking SQL or native access, and if you can avoid them, do so.
I suppose having larger fields from the get-go would avoid some of these issues. At the same time, I still don't advocate having unlimited field sizes on everything. There's something inside me that says a 32K item number is just wrong, that some limitations on field sizes is a good thing. Maybe that's just the dinosaur brain talking, I don't know.
What do you think?
Joe Pluta is the founder and chief architect of Pluta Brothers Design, Inc. and has been extending the IBM midrange since the days of the IBM System/3. Joe uses WebSphere extensively, especially as the base for PSC/400, the only product that can move your legacy systems to the Web using simple green-screen commands. He has written several books, including E-Deployment: The Fastest Path to the Web, Eclipse: Step by Step, and WDSC: Step by Step. Joe performs onsite mentoring and speaks at user groups around the country. You can reach him at
LATEST COMMENTS
MC Press Online