HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML
 

Working with Dynamic XML Documents, by Jon Siegel

XML gets mentioned a lot as an interoperability "platform." By itself, of course, XML can't be a platform because it's a document format. It may be flexible, human-readable, dynamic, popular, and cool because it looks a lot like HTML, but it's still just a document format, and there are a lot of differences between a document format and an interoperability platform.

To interoperate using XML, you either have to build an infrastructure around it or incorporate it into an infrastructure that already exists. While other folks build yet another infrastructure around XML, we show in this column how XML has been incorporated into the enterprise-ready, mature CORBA infrastructure. This is CORBA Corner, after all.

The W3C has supplemented XML with the Document Object Model (DOM), defined in OMG IDL. OMG members used the DOM as the basis for their XML mapping, but made one change along the way: instead of keeping the representation of each node in the XML document tree as a full-blown CORBA object, OMG's version represents a node as a CORBA valuetype. Passable by value but not a first-class CORBA object, the valuetype is the CORBA multilanguage equivalent of the Java serializable. And valuetypes are tailor-made to represent an XML document's structure: graphs of valuetypes, sent over the wire by including their root node (or any node, if the node structure links both up and down the tree) in the argument list of a CORBA call, will be reconstructed properly, in their entirety, at the receiving end. Send an XML document to a remote application and suddenly all navigation up and down its tree is done with local invocations instead of dozens of network round-trips.

In this article we investigate OMG's XML/value mapping and the things it lets us do with our XML documents. More than just a bridge between XML and CORBA - although it certainly is all of that - the valuetypes and their structure provide such an elegant API into the XML document (structure and content alike) that (in our opinion, anyhow) this deserves to be the way everyone works with XML content from a program, even in a non-CORBA environment. Here's what you can do with XML documents using the mapping:

  • Create a new XML document from scratch.
  • Read in an existing XML document from storage or from the network.
  • Parse the document into a multiply linked list of CORBA valuetypes. -Parsing can be done dynamically if there's no DTD with structural information about the document. -Parsing can take advantage of a DTD if there is one. -If the document and DTD versions are out of synchronization, the parsing can take advantage of the DTD as far as it goes.
  • Edit the document, including adding or deleting elements; adding, deleting, or changing attributes; and editing text.
  • As a linked list of valuetypes, the document may be sent around the network in CORBA calls with its structure intact. This includes secure, transactional CORBA calls and asynchronous calls using CORBA messaging. This is a great way to send XML data in an invocation.
  • Serialize the in-memory representation, generating a revised version of the Unicode-based XML format document that you're used to.
We don't have space to demonstrate all of these, but we'll look at as many as we can in the form of a programming example. We haven't included the specification details here to save room for example code, which isn't available free off the Web as the specification is. To get the specification, download doc.omg.org/orbos/00-08-10 (the specification document) and doc.omg.org/orbos/00-11-01 (zipped IDL file). Where the two files disagree, the IDL in the zip file supersedes.

Listing 1 is an example XML document that we use throughout this article. If your XML knowledge is a little hazy, point your browser to www.w3.org/XML. And to learn about the DOM, surf to www.w3.org/DOM.

Initializing and Reading the Document
We're not going to list the code that gets us started in XML document processing mode. Instead, we'll just list what we did:

  • Representation of XML documents as strings: Even though both XML and Java use Unicode, CORBA represents XML as a special DOMString type (typdef'd to sequence<short>). Why? Because you can use CORBA to go from any language to any other. Pass a Java string to a C program and you (probably) end up with an array of 8-bit chars; pass it to COBOL (unlikely, we admit, but possible) and the system attempts to translate your Unicode into EBCDIC! Yucko. So we've created a convenience function makeDOMString that converts a Java string into the programming language-independent DOMString type.
  • Reading in the document: We read the document in as a Java string and converted it into a DOMString.
  • Parsing the document: The parser is defined by the specification and supplied with your implementation. After locating the parser (probably via a call to resolve_initial_references), you invoke, for example,

    Document PO_doc = parser.parse(PO_Stream);

  • Error checking: The XML specification requires a parser to return an error, with no partial results, if a document contains even one XML structure/format error. (It doesn't care if you had the price of the bolts wrong, though.) The OMG specification is well prepared for this, with its definition of exception XMLException and 38 specific parsing error codes (numbered 2 through 39, of course). You should definitely check for these errors on return from parse.
On return from parse, if the routine found no errors during parsing, our document is stored in a multiply linked list of valuetypes starting at the root node PO_doc. Now let's do some things with it.

Editing the XML Document
If we're the company writing the PO, we need to edit it - adding or deleting items, changing quantities or POitem numbers or names, or whatever. To our programmer, the XML/value mapping structures the PO data to make it all easily available; using these program structures, the programmer will present the data to our clerk for editing via a GUI. The operation getElementsByTagName returns a list of Elements selected by Tag Name (duh!), so we'd probably start by retrieving all (that is, both) of the POitems this way:

DOMString name = makeDOMString("POitem");
// Retrieve items in Purchase Order:
NodeList elms = PO_doc.getElementsByTagName
(name);
Now elms, a sequence of Nodes, contains two elements - the two items in our Purchase Order. Each contains four child elements - the POitem_name, POitem_number, POitem_size, and POitem_quantity. We could easily display a POitem in a window for editing, or count the number of POitem nodes that we got back and display the number on the screen, or print it for confirmation when we print the PO.

Changing the Text in an Element
The specification uses OMG IDL attributes, which aren't the same as XML attributes. Here's a quick review in case you forgot how IDL attributes work: if you declare a variable to be an IDL attribute, the IDL compiler generates a get and set operation for it automatically (unless you declare it read-only, which eliminates the set operation). The get and set operations are mapped to programming languages just like all other operations. The Java mapping overloads the operations on the name of the variable: if you include an input argument, it's a set; leave it out and it's a get.

To demonstrate how we can change the text associated with a particular element, let's change the quantity of Bolt POitem_number B01420 to 150 gross. Listing 2 contains the code in a single block, with a few comments. The rest of this section explains it in more detail.

After defining two DOMStrings for use later, we start our loop over poitems in elms. elms.item(i) returns the ith Node in NodeList elms. (The operation name item comes from the XML/Value specification and has nothing to do with the fact that we're retrieving a poitem.) elms.item returns a Node; we have to cast the return value to an Element in order to assign it to element poitem.

Each poitem element has four children, tagnamed (from the strings in our XML document) POitem_name, POitem_number, POitem_size, and POitem_quantity.getElementsByTag-Name returns a list, so we declare ino and iqty to be NodeLists even though we're certain that only one element is going to come back from each call here. After checking that we have a valid poitem (even though we didn't bother to check that we had a valid PO!), we're ready to check and change the number of items we want to buy.

One of these lines of code (at least!) needs a little explanation. It's this one:

if (((Text)(ino.item(0).firstChild())).data().equals(checker))

The four Element valuetype children of poitem that we're working with here don't contain text - they have children that contain the text. Here's how we burrow down to the text itself.

ino is a one-element NodeList containing our POitem_number. item is the operation defined by the specification on NodeList that returns an item in the list by index number. (Once again, the operation name item has nothing to do with its being an item on our PO.) So ino.item(0) returns the first Element in our (one-element!) list.

Fortunately for us, this Element (and its brothers and sisters) has only a single Text Node, so we can retrieve it using the get operation of the readonly attribute Node firstChild defined on the Element. In Java the get operation for an attribute maps to the name of its parameter so the operation firstChild gets that node.

The firstChild is a Text Node, so we have to cast it to (Text) in order to retrieve the text from it.

The text that it contains is in attribute data, so we can retrieve it using the get operation for data, which in Java maps to the operation name data. Fortunately it's a DOMString, the same type as checker, so we don't have to do any more casting to do the comparison. Phew!

Naturally, we've strung all of these fetch operations together in a single line of code to show you how elegantly you can program with this specification and Java!

In the next line of code (not counting the comments) we use the set operation of the attribute data of the Text Node of the POitem_quantity Element to set the new quantity. Except for this, the tricks in this line are the same ones as in the line above it.

Adding a New Element
We can add a new element easily. Operations to create new Nodes of all types - that is, Node factories - are defined on our root Document node, so we invoke on PO_doc to create Elements and the Text Nodes. When you create an Element, you specify its tagName; when you create a Text Node, you pass in its text data.

Passing the Document in a CORBA Invocation
To pass our document as a tree of valuetypes, all we have to do is insert the root node as an argument in a CORBA call. For example, suppose our purchasing department runs a server that supports the operation PlaceOrder with this IDL:

Interface PurchasingServer {
Document ThisPO;
boolean PlaceOrder(in dom: :Document order);
};

In this operation ThisPO is a Document valuetype, and is an input argument to the CORBA invocation PlaceOrder. (We're not executing one of the Document methods.) When our client application invokes, in Java, the code in Listing 3, the entire purchase order tree gets sent over the wire to the server where it gets reconstructed exactly as it was in the client application, even though we've only included the root Document node of our purchase order in the argument list of PlaceOrder. This follows from the representation of the document node tree as a multiply linked list.

Writing Out the New or Revised XML Document
Once your user finishes editing the PO, you may want to write it out as an XML data file in Unicode. The operation to do this, serialize, is parallel in form to the parse operation discussed at the beginning of this article. Also, like the parse operation, serialize doesn't exist in the DOM at either Level 1 or Level 2. DOM Level 3 is supposed to introduce this functionality when it arrives.

Flyweight Pattern
It's not much of an issue for short XML documents, but long ones that repeat elements many times (and some documents may have hundreds, thousands, or even more instances of a given element) use up many bytes repeating element name text. The XML/value mapping uses the flyweight pattern to conserve this space: one instance of each element name (and other types of repeated text) is saved in an indexed array, and only the index number is saved with each element. The array is another valuetype, included in the structure of the document, so it goes over the wire along with everything else when you ship your valuetype tree around.

What About Documents with DTDs?
The specification treats static documents - that is, documents defined by a DTD - very well indeed, generating not only the IDL for a set of document-specific valuetypes but also their implementation. All you have to do is program the editing operations around these elements tailored to your DTD. We think the static mapping will be used a lot more than the dynamic mapping, but we had to present this first because it's the foundation for the static, which is based on the dynamic valuetypes with DTD-specified names. We'll present the static mapping in an upcoming column, so watch for it.

Acknowledgments
I'd like to thank Alan Conway and Darach Ennis of IONA Technologies, who wrote the example code for our sample XML file and answered many questions about the specification as we wrote the book chapter from which this article is excerpted.

Author Bio
Jon Siegel, PhD, is director of technology transfer at OMG where he writes and teaches about OMG's specifications: CORBA, the CORBAservices and CORBAfacilities, and the modeling specifications UML, the MOF, XMI, and the CWM. He is the author of CORBA 3 Fundamentals and Programming and the new book, Quick CORBA 3 (Wiley), from which this article was condensed. Mr. Siegel can be contacted at [email protected]

	


Listing 1

<purchase_order company="Enjay Manufacturing" 
number="01239876">
 <ship_to_address>
 <street>21 Pine 
Street</street>
 <city>Cleveland</city>
 <state>OH</state>
 <postcode>44113</postcode>
 </ship_to_address>
 <POitem_list>
 <POitem>
 
<POitem_name>bolt</POitem_name>
 
<POitem_number>BO1420</POitem_number>
 
<POitem_size>1/4X20</POitem_size>
 
<POitem_quantity>120gross</POitem_quantity>
 </POitem>
 <POitem>
 
<POitem_name>nut</POitem_name>
 
<POitem_number>NU14</POitem_number>
 
<POitem_size>1/4</POitem_size>
 
<POitem_quantity>120gross</POitem_quantity>
 </POitem>
 </POitem_list>
</purchase_order>

Listing 2 

// Modify any Bolt items quantity values to 150 gross
 // where their POitem_number is 'BO1420'
 DOMString checker = makeDOMString("BO1420");
 DOMString change = 
makeDOMString("150gross");
 // Loop over the items in our PO:
 for (int i = 0; i < elms.length(); i++)
 {
 Element poItem = 
(Element)elms.item(i);
 // ino is the POitem_number element for this 
poItem:
 NodeList ino =
 
poItem.getElementsByTagName(makeDOMString("POitem_number"));
 // iqty is the POitem_quantity element for 
this poItem:
 NodeList iqty =
 
poItem.getElementsByTagName(makeDOMString("POitem_quantity"));
 if (ino.length() != 1 || iqty.length() != 
1)
 {
 System.err.println("Invalid 
purchase Order");
 System.exit(1);
 }
 // This next line is explained in detail in 
the text
 if 
(((Text)(ino.item(0).firstChild())).data().equals(checker))
 {
 // Compare successful: this 
poItem needs its gross changed
 
((Text)(iqty.item(0).firstChild())).data(change);
 }
 }
 
Listing 3 

{
 // Retrieve a purchasing server object reference
 // from the naming service...
 PurchasingServer server = whatever;
 // Set up the document root of our PO tree 
structure:
 //
 dom.Document thePO = whatever; // set equal to our PO 
document root
 // Here we go...
 if (server.PlaceOrder(thePO)) // this line sends 
the entire document
 {
 // Success!
 }
 else
 {
 // Whoops.
 }
 }



  
 
 

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.