The description stack deals with a wide range of technologies that describe Web services in order to facilitate their common use for business process modeling and workflow choreography in B2B collaborations. The discovery stack deals with technologies that allow for directory, discovery, and inspection services. The wire stack consists of technologies that provide the steam for the runtime engines of Web services.
Figure 2 breaks these stacks into their subcomponents. Many of the available Web services technologies can be mapped to these stacks, although not all stacks have a corresponding specification or technology.
The list below shows the various functional areas of the Web services technology space. This list is not comprehensive, but it does cover most of the available technologies.
B2B collaboration
- Process modeling and orchestration
In the remainder of this article, I'll address each of the functional categories of Web services technology and discuss the standards that apply.
Basic Service
There are two main groups of technologies we must consider to create a basic Web service: service description and communication protocols. Transport protocols aren't specific to Web services, and hence aren't covered here.
Service Description
Service description standards specify what a service is about, what actions are supported by the service, what input and output parameters the service takes, and how the service deals with error conditions. In 2000, IBM and Microsoft came out with competing technologies: NASSL and SDL, respectively. Fortunately, they were soon merged into WSDL (Web Services Description Language). As of today, WSDL stands as one of the widely adopted technologies for describing Web services. WSDL 1.1 has been submitted to W3C, which has recently started a working group to ratify it.
WSDL takes a two-step approach to describing Web services. The first step is to provide an abstract definition of services and the data format; the second is to bind this abstract definition to concrete protocols. This two-step process permits reuse; it's possible to have many similar Web services based on one abstract definition with each implemented using different protocols.
WSDL is independent of any network and communication protocols, although it does define default binding to HTTP, SOAP, and MIME. Similarly, WSDL isn't tied to any type of system, although it does use XML Schema. WSDL is designed to be extensible to work with different types of systems and other network and communication protocols. Listing 1 demonstrates the simple WSDL grammar used to describe services. Note: This example isn't complete and won't parse; more namespaces need to be defined.
In Listing 1 the message element, along with the part element, defines the data in abstract terms. The operation element defines the action supported by the service. WSDL defines four basic operations: one-way, request-response, solicit-response, and notification. The portType element acts as a container for a set of abstract operations. In this example, we define a portType element, "StockQuotePortType", with a single operation, "GetLastTradePrice", which takes an input message, "GetLastTradePriceRequest", and gives an output message, "GetLastTradePriceResponse".
These abstract definitions are then bound to concrete protocols using the binding element. The port element captures the communication endpoint details, and the service element contains a list of related ports. The types element (not shown) acts as a data container holding various data type definitions. In the example, the operations in "StockQuotePortType" are bound to SOAP and HTTP.
WSDL enjoys the support of many tools. Some help generate WSDL from existing Java and C++ classes, and others generate Java and C++ classes from WSDL documents.
Communication Protocols
Standards in the communication protocols area deal with message format and serialization details. In order for a receiver to correctly parse and digest a message, the format of the message must be known. In contrast to the service description area, many protocols have been published in the communication area. These protocols include XML-RPC, SOAP, the ebXML messaging specification, WDDX, and Jabber.
XML-RPC is an XML-based RPC protocol based on HTTP POST with a simple data model that came from Userland software in 1998. Compared to SOAP it is simple; in addition to RPC, SOAP provides much richer processing semantics, an enhanced data model, and support for messaging. SOAP has garnered a great deal of attention and a huge user base.
The ebXML messaging specification, built on top of SOAP, is one part of a set of ebXML specifications. WDDX, an effort from Allaire, is focused on providing a simple, lightweight data exchange mechanism for Web programming languages such as ColdFusion, ASP, Perl, and PHP. Though RPC semantics can be layered on top of WDDX, it isn't as widely adopted as SOAP for RPC purposes. Jabber is an open-source protocol that enables exchange of structured information in a near-real-time manner between two or more end points. Jabber is used in the instant messaging areas.
Let's look at SOAP in detail, since it is the protocol of choice for most Web services.
SOAP, the Protocol of Choice
SOAP has a come a long way since its 0.9 release by Microsoft in 1999. SOAP is now handled by the W3C, which was close to publishing a last-call working draft of SOAP 1.2 at the time of this writing.
SOAP is a lightweight XML-based communication protocol for the exchange of information in a decentralized, distributed environment. SOAP is neutral with regard to language, platform, and programming model, allowing both the sender and the receiver to operate in their environment of choice. SOAP documents can be exchanged over many transport protocols.
The SOAP specification can be broadly classified into four main parts:
- A framework for describing the content of a message and how to process it
- A simple data model and a set of encoding rules for serialization
- A convention for representing remote procedure calls and responses
- A binding to HTTP
The SOAP "grammar" can be best demonstrated by a SOAP message, as shown below.
<env:Envelope xmlns:env=
"http://www.w3.org/2001/09/soap-envelope"
xmlns:app="www.rwav.com">
<env:Header>
<app:transactionId>010001</app:transactionId>
</env:Header>
<env:Body>
<app:getStockQuote>
<app:ticker>RWAV</app:ticker>
</app:getStockQuote>
</env:Body>
</env:Envelope>
In this example, the SOAP message is identified by the namespace-qualified root element "Envelope". The Envelope namespace determines the version of the SOAP specification to which a SOAP message conforms. The header element is optional; it is typically used to carry out-of-bounds information, such as transaction or security information. The header can contain any number of namespace-qualified XML elements, called entries or blocks. The above example contains one header entry named "app:trans actionId". The body element contains the essence of the message intended for the endpoint. Unlike the header element, the body element must be contained in every SOAP message; the body element can contain one or more namespace-qualified XML elements, called entries or blocks. The above example contains one application-defined body entry named "app:getStockQuote". SOAP defines one body block, called Fault, to represent errors.
As part of its encoding rules, SOAP defines a simple data model consisting of simple types, compound types similar to structs in programming languages, an array type, and an ID/HREF type that represents references. The encoding rules define a particular serialization rule for this data model. SOAP data model and encoding rules are optional. SOAP defines an "encodingStyle" attribute under the "env" namespace, which can be used to specify a particular encoding rule in effect for a specific element or group of elements.
Like encoding rules, the RPC conventions defined by SOAP are optional. In SOAP, both the request and the response of an RPC call are modeled as structs; they can also be modeled as arrays, according to recent changes in the SOAP specification. The name of the struct represents the name of the method being invoked. The parameters of a request or the results of an invocation are modeled as named accessors inside the struct. Our example message is an RPC request defined according to SOAP-RPC conventions. Though SOAP has defined a set of conventions for RPC, SOAP is not RPC-centric. It can be used for any general-purpose messaging.
SOAP can be exchanged over many transport protocols, but the SOAP 1.2 specification defines a binding to HTTP and provides an e-mail binding.
The W3C working group on SOAP is expected to publish their recommendation around August 2002. To participate or follow their progress, go to www.w3.org/2000/xp/Group.
Complex Payloads
So far we've looked at technologies that help create a basic XML Web service. The data exchange format and the message format in all these technologies is XML. But not all of the world's data is in XML. We have legacy systems, EDI systems, images, and many more formats. How can we use these new technologies for non-XML data? Converting all this data into XML is inefficient and time consuming; also, XML may not be the best representation for all kinds of data. For example, JPEG may make better sense for images. Even sending arbitrary XML could be a problem. We cannot simply take one XML document, insert it into another, and expect to end up with a valid XML document. Even to carry arbitrary XML in XML-based protocols such as SOAP, we need help.
There are at least two technologies that address this space:
- SOAP with Attachments
- DIME
SOAP with Attachments
SOAP with Attachments (SwA) was an effort by a group of individuals to combine the existing SOAP and MIME technologies to facilitate carrying arbitrary data in SOAP. The W3C has published SwA as a W3C note.
SwA doesn't introduce any new technology. Rather, it uses the referencing facilities in SOAP (HREF attribute) and Multipart MIME (RFC 2045) to make it possible to carry arbitrary data. The whole message is constructed as a multipart MIME message with the SOAP message as the root part. The MIME message can have any number of MIME parts, and the SOAP message can refer to any of these parts using the HREF attribute. In addition, the specification places a few more constraints (such as content-type and start parameter), and makes some recommendation on how the reference URIs in the HREF attribute can be resolved using existing RFCs.
Listing 2 shows a SOAP 1.2 message with an attached facsimile image of a signed claim form (claim061400a.tiff).
Until recently, SwA was the most popular way to carry arbitrary data in SOAP; now DIME seems to be shifting the balance. The W3C has not yet started any work on SwA.
DIME
DIME (Direct Internet Message Encapsulation) came from Microsoft and is published as an Internet-Draft by the Internet Engineering Task Force (IETF). DIME is a packaging protocol for multiple binary records with a fixed format and a variable record length. DIME allows for chunking, a process in which data is streamed out without having to be held in memory to calculate the maximum length. DIME has "begin record" and "end record" boundaries so the records can be assembled in order at the receiving end. Figure 3 provides details of the DIME record structure.
The MB, ME, and CF fields are bitmasks indicating the "begin", "chunk", and "end" of records. The Type Name field is a 3-bit field indicating the structure of the value of the type field. DIME provides a numeric value mapping for different media and MIME types. The ID field is used to give an identifier for each DIME payload. The maximum size of the data field is limited to 4GB.
Microsoft has also published a companion Internet-Draft that shows how SOAP messages can use DIME to send arbitrary data. Using SOAP with DIME is somewhat similar to using SwA. In both cases, the SOAP message is wrapped in a compound structure with the SOAP message as the root or first message, and the referenced parts as the second. DIME specifies rules for resolving the URIs referenced through HREF attributes in SOAP messages; they are similar to SwA rules (RFC 2396 and 2557). DIME also adds on to SOAP-HTTP binding semantics by specifying the content-type as application/dime, rather than the default text/xml specified by SOAP.
It's important to note that in both SwA and DIME, the SOAP message itself travels as either a MIME or DIME message with respect to the carrier or transport protocols.
SwA Versus SOAP with DIME
DIME is designed for simplicity, with SOAP and XML Web services in mind, while MIME offers great flexibility. DIME makes data handling a bit easier, as it requires that the data length be specified. DIME also makes parsing easier, since it's easier to identify the boundaries of different records using data length, rather than scanning for the string separators used in Multipart MIME to separate data records. The compulsory inclusion of data length may also help in heap management. DIME does not require encoding of binary data, and hence may be faster. MIME on the other hand is very flexible and well-understood, with many implementations supporting it.
Discovery
If programmatic discovery of Web services needs to be supported, there are a few technologies that can help:
- WS-Inspection (WS-I)
- UDDI
- ebXML-Registry and Repository Specification
UDDI has garnered as much attention as SOAP and WSDL; the three together are now considered the basic building blocks of any Web service. Though ebXML has a registry specification of its own that can be used as a standalone specification, it hasn't attracted much attention. WS-I is a companion technology to UDDI that addresses a specific purpose in the area of service discovery. I'll focus on UDDI and WS-I.
UDDI
UDDI (Universal Description, Discovery, and Integration) is an effort by a group of companies that hasn't yet been submitted to any other consortium or standards body. UDDI consists of an XML Schema that allows a user to provide a description of a Web service along with its business information. The description of a service in UDDI schema takes a business-centric view, whereas WSDL takes a functional view. In addition, UDDI also defines an API specification that allows a user to publish Web services and to query and obtain information on other published services. UDDI publish/ query is based on SOAP messages.
The other face of UDDI is the repository itself. A repository implements the API specification with which users can publish or discover services. UDDI repositories are logically centralized and physically distributed. As of this writing, there are four node operators running UDDI registries: Microsoft, IBM, SAP, and HP.
The UDDI repository implementations are open source; users could get them and run their own in-house UDDI repositories. Though UDDI was initially touted as the technology that would open the gates for dynamic discovery of Web services and dynamic collaboration, it is more and more frequently used for Intranet and in-house repository needs.
WS-Inspection
WS-I was a joint offering from IBM and Microsoft released in 2001. While UDDI involves going to a central place to publish and query about services, WS-I involves going to a site offering Web services and seeking information about the services offered at there.
WS-I defines a simple grammar to aggregate service description documents of various services offered at that site. The service descriptions can be in any format, such as WSDL or UDDI. There can be many service descriptions per service, and many services can be defined in a single WS-I document. WS-I also defines an extended binding grammar for both WSDL and UDDI that provides hints about what may be found in the referred service description documents.
WS-I makes some recommendations on how its documents may be made available to users, so they are easily found. WS-Inspection documents may also be placed within a content medium such as HTML.
Conclusion
So far we have looked at technologies in service description, communication protocols, complex payloads, on-site inspection, and general discovery. In the next part of this series, we will look at technologies that deal with enterprise-strength issues, such as transactions and security, and technologies that cover routing and process orchestration.
Resources