If you're connected to the Internet, you're probably thinking about a Web-based application. You can develop a variety of applications, ranging from simple static "brochure-ware" to complex applications integrated with your existing legacy systems. I've written in the past about the various application architectures you can employ to design the more sophisticated applications, but this article will focus on something a little more low-level: the middleware used to communicate between the client application and the host.
This article is a guide to designing Web applications using a message-based architecture. I'll introduce the concepts of clients and servers, as well as the various communications methods, and then go into detail on the merits of message-based processing.
Clients and Servers
If you need distributed applications, then regardless of the type of application, there are two fundamental pieces to the puzzle: the client and the server. Where these two entities reside is really not important; what's far more important is how they communicate with one another.
A client is a program that does one of two things: It requests data, or it requests that an action be performed, usually on the database. While there are many variations on this theme, they all fall into these basic categories. I tend to use the following broad categories:
1. QUERY: Return a set of data based on input parameters
2. CRUD: Create, Read, Update, or Delete records in the database
3. REPORT: Initiate a batch process whose results will be returned as a document
You can write a wide variety of applications using just these basic requests. If the architecture is properly designed, the clients can be green-screen applications called up from a 5250 display, thick clients running on locally attached workstations, or browser-based applications using servlets and/or Java Server Pages (JSPs) running anywhere on the Internet.
Also, the more independent your clients are, the more independent your servers are. Servers can be as specific as an ODBC interface or as flexible as an XML document processor. The ODBC interface is easy to implement and usually performs well, but you'll pay the price of being able to support only ODBC requests. An XML processor, on the other hand, can be almost limitless in the type of requests it supports, but the price is quite a bit of overhead and up-front design. This article describes an approach that strikes a balance between the power of XML and the ease of use of SQL.
Communications
Many communication techniques exist today--from screen-scraping the 5250 display, to ODBC, to HTML, to XML. You can call programs or invoke stored procedures. Each technique has advantages and disadvantages.
- 5250 screen scrapers use a 5250 emulator to read the 5250 display stream and to enter data back as if the user were keying it at the display. This technique is most useful with legacy systems, because it usually requires no change to the existing programs. The drawback is that it requires a 5250 display session and, in most cases, is penalized by the interactive tax of the iSeries.
- ODBC is the communications technology underlying embedded SQL and JDBC. In essence, the syntax of an SQL statement is compiled into a program (or generated on the fly) and then sent to an ODBC interface, which executes it. The primary benefit of this technique is the standardized nature of the SQL syntax and the fact that most database providers (including IBM) are working hard to increase the performance of their ODBC interfaces. The biggest weakness of the pure ODBC approach is that the syntax is tied directly to the physical database layout, and thus not only are your clients bound to your server's database, but more importantly, your database layout is captive to your clients.
- HTML was designed solely to communicate with human users. It is focused on rendering, not on content, and so is unsuitable for use as a peer-to-peer communications medium.
- XML, on the other hand, was designed from the ground up to be used as a content-aware data communication format. There is a hefty learning curve, a large start-up cost in terms of defining your messages, and significant overhead in processing time. Despite that, in a resource-rich environment, XML is probably the most robust and flexible of communications techniques.
- Program calls and stored procedures are similar in that they are designed to accept a set of parameters and return a result. In practice, they separate the client and the database almost as well as a complete message-based architecture, with less of the up-front overhead. Their only real disadvantage is that they are less flexible than a message-based system when the actual interface changes.
Message-Based Processing
This article focuses on a specific communication method: message-based processing. Message-based processing has been around for a long time, and it has some specific shortcomings that may make you wonder why I recommend it. Part of the reason is that, since it has been around so long, all of the issues have been worked out in one way or another. But more importantly, a specific feature that is exclusive to message-based processing makes it uniquely suited for the fast-paced world of Web-enabled software: It can support older and newer clients simultaneously.
The Impact of Change
When I was designing architectures for System Software Associates, a term that was constantly used whenever enhancements or fixes were required was "impact analysis." An impact analysis determined what programs had to be changed for a given modification. Modifications to files were always high-impact, and central master files were so widespread in their use that enhancements were sometimes designed specifically to avoid changing a master file. Anyone familiar with how we implemented multiple facility processing will know what I mean--rather than change a key field in several files from warehouse to facility, we instead added a cross-reference file that we used for processing everything. It caused a lot of unnecessary code and added processing overhead, but it avoided a change to several master files. These are the kinds of decisions you must make when your systems are insufficiently insulated from change. And remember, this was in an environment where we had total control over all the programs.
In the brave new world of distributed processing, things are even more difficult. The client programs usually run on workstations, making it difficult to keep them up-to-date when the interface changes. If you have a large PC user base, you probably already know how difficult it is to keep the PCs up-to-date, even for something as critical as virus protection. This is doubly the case for applications, because users often figure that if it works, why fix it? They may not know that subtle changes to the database have caused their version to be as dangerous as any virus. If your programs are run on an intranet, you may have some degree of control over them--in fact, by using a mapped drive you can actually centrally locate your applications--but unless you have those procedures in place, you're running the risk of having programs disrupt your database integrity whenever you change your business logic. If you're lucky, the programs will fail when the interface changes (if you're not, they'll run, but they'll give false results or even corrupt your database). If your application is run remotely over the Internet, the problem grows exponentially.
Insulation from Change
Insulation from change is the primary characteristic of a message-based architecture. Any other form of direct access, either through program calls or direct database access, exposes clients to changes in the host software. This is especially true with data-centric techniques such as ODBC. For example, if you change the format of your date fields, every program that accesses those fields will have to change.
This is not the case in a message-based architecture. Instead, each interaction between a client and a server is cataloged as a request and a corresponding response. The layout of the request and response may or may not have anything to do with the actual physical layout of the data. (It should be noted that for ease of setup, the first generation of messages may at least be very similar to the database files, but that's not necessarily a bad thing, as I'll show later.)
I'll start introducing my little application now. The objective is to have the ability to retrieve an employee's name, age, and number of years worked. The database is straightforward. I've depicted it in the table below.
EMPMST | |
Employee ID | 10A |
Employee Name | 50A |
Date of Birth | 8S0 (CYMD) |
Hire Date | 8S0 (CYMD) |
Time to see how it works in a client/server environment.
How It Works
In this section, I'll take you through the same exercise using SQL and using a message-based approach. First, I'll solve the original problem. Then, I'll respond to some different business requirements. As I continue on, you can compare the ease with which each approach allows you to keep up with changing business demands.
Original SQL
Well, getting the name is simple enough. But as soon as I began work on this example, I found that the syntax for extracting the age from a CYMD field was a little bit complex, as I've shown in Figure 1.
|
Figure 1: Use this SQL to extract the employee's age.
The SQL literate among you will argue that I should have used DATE fields. That's not an option with legacy systems, but for the sake of argument, I'm going to change my database. EMPMS2 will contain the same data, but the two date fields will be stored as DATE fields. Please note, though, that now my database layout is being dictated by my clients. That is, I have to make database design decisions based on how they affect my client programs, rather than on efficiency, cost, or other business reasons. This is what we're trying to avoid, and it's one of the problems with a rigid interface such as ODBC.
Anyway, I've changed my database, so now my client program can proceed with a simple extract. Using RPG and embedded SQL, the syntax would be something like what is shown in Figure 2.
|
Figure 2: This SQL extracts the required information from our new database.
A very important point is that the names used in the request are specifically those in the database. SQL requires that the file and field names match the ones in the database. Since this code is in the client (and is in fact in every client), anytime the database changes, all clients must change. While this shouldn't happen often, when it does, it can cause real headaches.
Original Message-Based Process
How does this compare with the work required for a message-based request? Well, to start with, I have to define a request, a response, and a server. My request and response will be quite simple, as shown in the table below.
Request | Response | ||
Employee ID | 10A | Employee Name | 50A |
Employee Age | 3S0 | ||
Years on Job | 2S0 |
I create two data structures, one for the request and one for the response. I populate the data structure for the request and pass it to the server. I receive back the data structure containing the response. This would work just fine, but if this was the limit of the design, I would have to have one server program for every request, and I would have to know the name of every server.
Instead, I'm going to introduce the concept of a request dispatcher. This is the central idea of a message-based architecture: The data structures that hold the request and response are actually part of a larger data structure, one that can be used to handle any request. The basics are shown in the following table.
Message | |
Client ID | 10A |
Server ID | 10A |
Request Code | 2A |
Return Code | 2A |
Message Data | (*) |
The client ID is assigned to the client when it starts up. The server ID tells the dispatcher which server to call, while the request code identifies the contents of the message data. For example, a request of '01' might retrieve the employee data I detailed above, while a request of '02' might update the data. The return code identifies at a high level whether the request was successful or not. This is a bidirectional parameter: The message data contains the request when sent to the server, and it contains the response when returned to the client.
How the request gets to the server and the response gets back is irrelevant at this point. To keep the focus on the design, I'm going to use a simple dispatcher program: The client program calls the dispatcher, which calls the appropriate server program. You may notice several problems with this approach, primarily the fact that it limits the size of the data. I'm trying to keep the scope of the topic within a single article--a more complete design supports an arbitrary number of message segments in either direction, each with its own type. This is relatively easy to accomplish using a mechanism such as data queues, but that's too much detail for this article. I'll leave that portion as an exercise for the reader. Something even more interesting is that a request can be routed to another machine--on an entirely different platform, if necessary. But again, that's a different story for a different day.
Instead, it's time to write the server program. I know, I know, the SQL version is already up and working and installed on 20 PCs by now. But bear with me. The program is very simple, as shown in Figure 3.
|
Figure 3: This is the server program for the employee information request.
I also have to write the dispatcher, but that's a one-time cost, and it's even simpler than the server program. Even as dispatching gets more complex, it's important to remember that the dispatcher is a one-time cost: Write it once, and it works for every request. The servers are where the real work is done, and this is where the benefits of message-based programming begin to become apparent.
Business Scenario 1: A Calculation Changes
This is a simple change. The company uses the years on the job to determine certain benefits, and it's been decided that an employee should get credit for a full year after being on board six months. For example, my hire date is 10/31/2000, so my calculated years on the job should be three, rather than the two that the normal calculation returns. For the SQL, the change is relatively simple (although it took me a little while to find it), and is shown in Figure 4.
|
Figure 4: This SQL is required to implement the new calculation.
Now, I have to change the calculation in every client program that calculates the number of years on the job. See Figure 5.
|
Figure 5: Here's the modification required to implement the same change in a message-based architecture.
What's the corresponding change in the message-based approach? Well, we have to change the server program. I added one line, as shown in Figure 5. Now to the clients. I have to change...nothing! Not a single client changes, because the code is localized in the server! This is probably the most important benefit of a message-based approach, although by no means the only one.
Business Scenario 2: A File Format Changes
The file must now be sorted by last name. Originally, the name field was a single field. Now, however, I need to separate the data out into individual fields for first name, last name, and middle initial. EMPNAM now becomes EMPFNM, EMPLNM, and EMPINI. See Figure 6.
LATEST COMMENTS
MC Press Online