The Extensible Markup Language (XML) is a simple text format that enables the sharing of information in this new age of internetworked machines. The XML standard was adopted just 2 years ago, and its already reshaping the data processing world. XMLs great influence in the computer environment is due to its independence from any particular programming language or operating system. The development of the XML standard was the result of great cooperation among longtime market rivals. XML has the full support of IBM, Microsoft, Sun, Lotus, Oracle, and virtually any other household computer name you can think of.
XML by itself is simple, which was an important reason for its quick adoption. This simplicity makes XML very versatile. XML can be used to exchange information among different subsystems of an application and to separate content from presentation, and its becoming the preferred format for sharing information between business partners. While XMLs simplicity makes it versatile, it also makes it insufficient on its own for some of the most sophisticated tasks put to it.
Of the many technologies emerging around XML, two will have the greatest impact on the way business applications are developed. They are the XML schemas and the Simple Object Access Protocol (SOAP).
What Is XML?
Through XML, a documents sections and individual values are tagged with meaningful names. Figure 1 shows a simplistic invoice expressed in XML. In very crude terms, the document looks like a multiformat file. It contains five record formats: Header, Contact, Item, Footer, and Address. Each individual field is tagged with a meaningful name. While the document is easy to read, it doesnt describe how invoices are formatted; its only an example.
XML documents dont have records and fields. Instead, they are composed of elements and attributes. Each tag marks an element. In the invoice example, Contact is one such element. The values within a tag are its attributes. In the example, the element Contact has the attributes Name, Company, and Email. The extensibility of XML (the X in XML) is what makes the language so attractive. Developers define the attributes and elements of a particular kind of document; that is, they define the vocabulary and syntax of the document.
The XML Standard
When the XML standard was first adopted, it included a language for specifying some of the rules governing certain classes of documents. This language, called the document type definition (DTD), is often used to define document types. However, DTD has proven to be too limited to describe the most complex relationships among elements of a document. As with many of the new standards used in this Internet Age, XML had to be released as quickly as possible to give some order to the double-step march of technology development that the Internet brings.
The organization responsible for Internet standards, the World Wide Web Consortium (W3C), acknowledges the DTDs limitations on its W3C XML Activity page (www.w3.org/ XML/Activity.html). It states the following:
While XML 1.0 supplies a mechanism (DTD) for declaring constraints on the use of markup, automated processing of XML documents requires more rigorous and comprehensive facilities in this area. Requirements are for constraints on how the component parts of an application fit together, the document structure, attributes, data typing, and so on. The [W3C] XML Schema Working Group is addressing means for defining the structure, content and semantics of XML documents.
XML Schemas
The new means for defining what documents should look like are XML schemas. I am eager to see the final version of the first standard XML schema, coming in the next few months. In the meantime, you can get a flavor of what XML schemas will look like by using the implementation provided by one of the components of Microsoft Internet Explorer 5.0. Even though the implementation is really a technology preview of XML schemas, its very useful for developers wanting to build prototypes and gain experience with the new technology. This preview of XML schemas gives non-bleeding-edge business application developers a look at the brave new world that XML will usher into MIS departments.
The Internet Explorer component that deals with XML is the ActiveX Microsoft XML (MSXML). It can be used as part of the browser or as a standalone component of any language that is capable of using ActiveX components, such as ASNA Visual RPG (AVR) or Visual Basic (VB).
Whoever creates the definition for a type of document is actually creating a language to express that particular kind of document. In the invoice example (Figure 1, page 83), I used an invoice markup language to express the invoice.
Whenever two entities share information via XML documents, they have to agree on the format of such documents. Both the consumer and the producer of the documents must have access to the description of the format being used; they both need to know the XML schema being used. This situation is analogous to having business partners share the data description specifications (DDS) that describe the files used in the exchange of information between them.
Figure 2 (on page 83) shows a portion of the XML schema that describes the invoice in Figure 1. The main XML tags that describe a document are Schema, AttributeType, ElementType, element, and attribute.
The first tag, Schema, marks the document as a schema and names the schema that is being defined (InvoiceType). XML schemas use namespaces to qualify their own vocabulary. The first attri-bute, xmlns =urn:schemas-microsoft-com:xml-data, sets the default namespace to be the schema vocabulary. Any nonqualified tag within this document will be taken to be from the urn:schemas-microsoft-com:xml-data namespace. The second xmlns
Schema
attribute, xmlns:dt=urn:schemas-microsoft-com:datatypes, states that the prefix dt will be used to qualify data types.
AttributeType
The AttributeType tag defines the characteristics of attributes that will be part of the elements. You can think of attributes as fields or scalars. Attributes include name and type. They may also indicate whether they are required or optional, and, if applicable, they will contain a set of valid enumerated values. The syntax for AttributeType is the following: Figure 2 shows only a few of the AttributeType tags used in the schema but should be sufficient to give you a flavor of how they operate. Observe how you can specify the type of each individual attribute. MSXML recognizes many data types, from string and integers to decimal numbers, dates, and time stamps. It can validate a document against a schema and can convert the attributes values to variables that are usable in your programs.
ElementType
The ElementType tag is used to define an element. It can specify a scalar with a simple type or, as shown in Figure 2, define a data structure composed of other elements and attributes. A well-formed XML document has one, and only one, root element. The element type Invoice defines the root element; all other elements will be children of Invoice. The attribute content = eltOnly declares Invoice as a container of other elements, which means that no unmarked free text is allowed in it. The order = seq attribute requires the subelements to appear in the specified sequence. The syntax for ElementType is as follows:
model={open | closed} name=idref order={one | seq | many} >
Elements and Attributes
Within the opening and closing tags of ElementType, you can specify the element and attributes that are valid for this element type. Five elements compose the element type Invoice: Header, ItemList, Footer, InvoiceTo, and DeliverTo. Invoice has no attributes.
Look at the element type Header, which contains two attributes and one element. The attributes are InvNum and InvDate. These attributes are defined a few lines above in the schema. The element Contact, defined toward the end of the schema, contains only attributes. This is denoted by the content = empty attribute of the ElementType tag.
Inspect the rest of the schema, and you should be able to figure out what it all means. Notice the ItemList element type. It consists of the single element Item, which has to occur at least once but can appear many times. The invoice in Figure 1 shows an ItemList with two Items.
The syntax for the element tag is the following: A schema may also have the following elements: datatype, description, and group. The datatype tag specifies the data type for an ElementType or AttributeType. The description tag documents an ElementType or an AttributeType. The group tag organizes
The syntax for the attribute tag is as follows:
content into a group to specify a sequence or a minimum or maximum occurrence without having to create a new ElementType.
Once armed with the proper schema and a good parser, you can create applications to consume documents expressed in XML. The parser will validate lexical and syntactical correctness. Your application will still have to validate the semantics of the document, such as whether the customer number exists in your database or whether the ItemId is valid, but you wont have to worry about the correctness of numbers, dates, or enumerated values.
For a complete overview of schemas, see the XML Schema Developers Guide Web site (msdn.microsoft.com/xml/xmlguide/schema-overview.asp). The W3C site (www.w3.org/XML/Schema.html) contains the latest information about the development of the standard for schemas. A document of requirements for standardization was published in early 1999, and the specification entered a state called Last Call in April 2000, meaning that the standard is just around the corner.
We usually think of XML documents and schemas in the context of data sharing, which is definitely their most common task. But a whole new field, sharing processes, is opening for XML application development. With the emergence of SOAP, XML and schemas on the Web extend to allow the creation of a new style of distributed applications.
Whenever I encounter a new technology, I always try to find some analogy that will help me understand and explain it. When I think of schemas, I use the crude analogy of DDS for multiformat files; when exposed to SOAP, I use the also crude analogy of the call and parameter op codes. In these terms, you can think of SOAP as the ability to call programs, passing parameters across programming languages, machines, and even operating systems. If you are thinking that SOAP sounds like another Remote Procedure Calls (RPCs) protocol, you are right. Yes, SOAP is just that, another RPC. So what is the big deal about it? The reason SOAP has attracted support from IBM, Lotus, IONA Technologies, and many others lies in the neutrality of the protocol and the use of other Web standards. SOAP can be easily implemented on different operating systems, making its use on the Web very attractive.
SOAP uses two Web standards for its implementation. It uses XML as the format for marshaling the parameters of the call and identifying the process to execute, and it uses HTTP as the protocol (the same protocol used by Web browsers and servers) for handling the communication. By using HTTP, SOAP can use the current infrastructure of firewalls, routers, and servers already in place.
The original SOAP specification was drafted in the spring of 1998 by Dave Winer, Microsoft, and DevelopMentor. It started life as an RPC so that Distributed Component Object Model (DCOM) could go over the Internet via HTTP and XML. SOAP is being standardized by W3C and marks the fourth stage of use of the Web infrastructure. The first stage enabled users to share static content. (The enabling technologies at the first stage are the HTTP servers and the HTML format for static pages.) In the second stage, Web browsers became terminals in the worlds data processing systems. By using HTML forms, coupled with Common Gateway Interface (CGI), Active Server Pages (ASPs), or JavaServer Pages (JSPs), users can interact with the online transaction processing (OLTP) systems of their suppliers, customers, and partners. In the third stage, XML transformed the Web into a huge online database system. When a Web server generates XML, it opens the possibility for programmers to create applications that directly consume information without having a Web browser involved in the process. Imagine the kind of applications you can develop with this technology: You can create a program that can go out onto the Web, request information from multiple Web sites, consolidate that data, and either present it to a user or directly affect your own database.
Finally, in the fourth stage, with the advent of SOAP, the Web has become programmable. Programmers can request the services of processes that run in their
SOAP
partners sites. Imagine creating an ActiveX application in AVR, running on Windows, that could call a method of a CORBA object developed in C++ that is running on a Linux- powered machine!
SOAP defines a mechanism for passing commands and parameters between HTTP clients and servers. SOAP doesnt care what operating system, programming language, or object model is being used on either the server side or the client side. Its utterly agnostic, except that it needs HTTP as a transport.
Figure 3 (on page 84) shows a simple SOAP request. This message would be sent to a tax server, which would help the calling application calculate the tax to be levied on $123.45 at a rate code of 8. The first four lines are part of the HTTP header. The request is being made of the fictitious Web site www.taxserver.com. The page being targeted is the ASP TaxCulator.asp. The content length and type are required for any HTTP message containing a payload. The content type is set to XML. The actual payload is an XML document, serving as an envelope of the data sent to the calculator. The node
The application creating this message and making the request needs to understand only how to format its data according to its schema, stuff a SOAP envelope, and format an HTTP header. The HTTP server receiving the message knows this is a SOAP call because it recognizes the SOAPMethodName header and can dispatch the request to the appropriate handler, in this case the TaxCulator.asp page.
The Future of Developing on the Web
Creating and using Web services will become even easier. ASNA, Microsoft, and others are incorporating the notion of Web services right into such languages as AVR and VB. Once this happens, a developer will only have to express his intention of making his program-a Web service-and the development environment will take care of exposing the schema defining the objects type library. To use a Web service, the programmer will point the integrated development environment (IDE) to the Web site that is hosting the service, and the IDE will take care of packaging the messages to and from the service.
Microsoft is investing heavily in the creation of this kind of Web service beyond the enhancement of languages. Many of its other efforts are under the umbrella of a new server that is available for Windows NT and Windows 2000, the BizTalk server. On the BizTalk Web site (www.biztalk.org), you can learn everything you ever wanted to know about XML schemas and Web services. The site also includes a public library of schemas and hopes to reach a set of standards for the sharing of data and services across the Web.
If the AS/400 is to become a strong player in the new business-to-business (B2B) era, IBM will have to not only support but also facilitate programming to such standards as XML and SOAP. Hopefully, IBM can deliver this new technology to the application programmer through easy-to-use tools and language extensions.
(Editors Note: IBM has finished the Java reference version of SOAP v1.1, [SOAP4J], and it is available for download at www.alphaworks. ibm.com/tech/soap4j. Also, on June 1st, IBM contributed the specification to the Apache Software Foundations open source XML project. Since the AS/400 runs Java, this means that SOAP is now available for use on the AS/400 platform.)
REFERENCES AND RELATED MATERIALS
BizTalk Web site: www.biztalk.org
W3C XML Activity page: www.w3.org/XML/Activity.html
W3C XML Schema Web site: www.w3.org/XML/Schema.html
XML Schema Developers Guide: msdn.microsoft.com/xml/xmlguide/schemaoverview.asp
xmlns=urn:schemas-microsoft-com:xml-data
xmlns:dt=urn:schemas-microsoft-com:datatypes>
...
...
1.
POST /TaxCulator.asp HTTP/1.1
Host: www.taxserver.com
Content-Type: text/xml
Content-Length: nnnn
SOAPMethodName: MySchemaURI#CalcTax
xmlns:soap=http://schemas.xmlsoap.org/soap/envelope/>
Figure 1: This invoice data is expressed using XML.
Figure 2: This is the partial XML schema for the invoice shown in Figure
Figure 3: The simple SOAP request calculates tax.
LATEST COMMENTS
MC Press Online