In the first article in this series, we looked at the basics of the XML support added to the RPG IV in V5R4. In this second article, we will continue those explorations and look at some of the additional features of the XML-INTO opcode.
As noted previously, RPG's current size limits can cause problems. For example, we use arrays to map repeating elements in the XML document, but RPG arrays are limited to a maximum of 32,767 elements. What if the document we wish to process contains 35,000 repeated elements? Such a structure simply cannot be defined in RPG. Luckily, IBM's RPG compiler team anticipated this need and provided a means whereby the document can be processed in pieces. The key to unlocking this particular door is the BIF %HANDLER, which associates a user-defined handler procedure with the parserlanguage , thereby allowing us to parse the XML document in pieces. The underlying mechanism that is used is a methodology known as "call-back processing." If this technique is unfamiliar territory for you, I suggest that you read this article's companion article in this issue, "Call-Back Processing: A Brief Introduction" before continuing.
For the purposes of my example, I am making the assumption that the XML document we are processing can contain a large number of Category elements—too many to be accommodated within RPG's limits—and therefore we must process the document one Category at a time.
Let's start our investigation of this example by looking at the differences in the way the XML-INTO opcode (I) is specified. The first thing you will notice is that instead of specifying the name of the variable to be filled, we specify the BIF %HANDLER. It is the parameters to this BIF that will ultimately identify the variable to be filled.
(I) XML-INTO %Handler(ProcessCategory: categoryCount)
%XML(XML_Source: 'case=any doc=file allowmissing=yes +
path=Products/Category');
The BIF takes two parameters. The first is the name of the prototype of the handler procedure: ProcessCategory. As each "piece" of the XML document is parsed, this procedure will be "called back" to process the data extracted. We will look at the details of the prototype in a few minutes. The second parameter is known as the communications area. It can be any type of data you like: a simple variable, an array, a data structure…anything. Its purpose is to allow parameters to be passed indirectly from the main line code—via the XML parser—to the handler procedure. In my example, the handler procedure is in the same program as the mainline code and therefore could have accessed the mainline's global variables. But the handler could have been in a service program and would therefore have no access to such data. I am using the communications area as a means of obtaining a count of the total number of product categories in the document. This is useful as the XML elements variable in the PSDS is not populated when using %HANDLER. Instead, as you will see in a minute, a count of the number of elements processed is passed to the handler each time it is called.
Before we move on to the prototype for the handler, there is one more change to XML-INTO that we need to address. Notice that the %XML BIF now includes the path= option. It is a requirement of using %HANDLER that a path be supplied to direct the parser to the correct starting point in the XML document.
Let's look at the prototype (J). The parameters passed to the handler follow a standard pattern:
(J) D ProcessCategory...
D Pr 10i 0
D categoryCount 5i 0
(K) D category LikeDS(categoryDS) Dim(1)
D Const
D elements 10i 0 Value
The first parameter (categoryCount) is the communications area, and its definition is therefore up to you.
The second (K) identifies the variable (category) that the parser will fill before calling the handler. In other words it is the -INTO variable. This parameter has two additional requirements:
· It must be specified as an array, even if, as in this case, it contains only a single element.
· And it must be specified as a read-only parameter (i.e., by using the CONST keyword).
The third and final parameter (elements) is a count of the number of array elements filled this time. In this program, we will ignore it as it will never contain a value other than 1 since our -INTO variable is defined as DIM(1). It provides a similar function to the PSDS variable XML elements. It must be defined as a four-byte integer (10i) passed by VALUE.
The handler's return value must also be defined as a four-byte integer (10i). It is used to communicate between your procedure and the parser. We'll see how this is used in a moment.
Now that we have studied the prototype, it is time to look at the actual handler subprocedure itself and see how the parameters are used.
The first thing we do (L) is to increment the categoryCount variable passed as the communications area. We then proceed to loop through the product entries in the category DS that we received as the second parameter. Since there is a variable number of products in each category, we need to test the product code for blanks in order to determine the end of the list (M).
Dsply ('Category ' + category(1).description);
(L) categoryCount += 1;
For p = 1 to %Elem(category.product);
(M) If category(1).product(p).code = *Blanks;
Leave; // Exit once blank product code entry located
Else;
// Process the current product entry
Dsply ('Product: ' + category(1).product(p).description);
EndIf;
EndFor;
// p will always be 1 greater than real count so reduce for display
Dsply (%Char(p - 1) + ' products found');
(N) Return 0;
Once all of the products in Category have been processed, we simply return control to the parser. You can see this at (N). Note that a return value of 0 (zero) informs the parser to continue processing. Any non-zero value would cause the parser to abort, and control would then be returned to the first operation following the XML-INTO.
The handler procedure will be called repeatedly until the parser determines that there is no more data to process, and at that point, control is returned to the operation following the XML-INTO that started the whole process. At this point (O), we simply display the count of the number of Categories accumulated in the categoryCount variable.
(O) Dsply ('Total of ' + %Char(categoryCount) + ' Categories processed');
That's really all there is to it. Particularly for those of you who are unfamiliar with call-back processing, the best way to get a handle on how this all works is to step through the program in debug mode. Notice that once you hit the Return op-code in the handler (N), you will simply leap back up to the top of the handler procedure—or at least that is the way it will appear to you. In reality, control has returned to the parser, and it has in turn called your handling procedure once again. But the parser is not debuggable code, so the debugger simply zips past it.
While the coding of XML-INTO with %HANDLER may seem complex at first, rest assured that this is mostly a question of familiarity. Once you have coded one or two of them, you quickly realize that they are all pretty much the same.
Next time, we'll look at XML-INTO's little brother, XML-SAX.
(Author's note: You can find the complete source code for this article here.)
LATEST COMMENTS
MC Press Online