XML Support in Whidbey
Reading, writing, transforming, and querying XML
The Internet is a large, heterogeneous collection of interconnected systems. To leverage the distributed computing opportunities that the Internet offers, developers had to agree upon a way to represent data that would be exchanged over the Internet. And that way is XML.
Now, any distributed system consists of a bunch of nodes (possibly diverse ones) that create, process, or consume data. After XML was agreed upon as the data format, different distributed computing environments added support to read, represent, manipulate, and serialize (and also persist) XML data. In this article we are going to briefly look at the XML support that is built into the next version of the Microsoft .NET Framework and Visual Studio .NET, codenamed "Whidbey."
XML: As of Now
The .NET Framework 1.0 and 1.1 provide excellent support for XML-related operations, including:
- The ability to read and write an XML document (an XML byte stream) using streaming APIs (System.Xml.XmlReader and System.Xml.XmlWriter). The XmlReader provides a forward-only, read-only, pull-based API (i.e., the client pulls XML nodes from the processor), unlike with SAX (Simple API for XML), which is push based (i.e., the processor pushes the XML nodes to the client through events). The XmlReader is the fastest and most memory-efficient way to read XML. XmlReader and XmlWriter are abstract classes that provide extensible points for developers to plug in their own streaming readers and writers. The .NET Framework provides the XmlTextReader and the XmlTextWriter as implementations.
- The ability to load an XML document into an in-memory cache and expose it via the W3C DOM Level 2 API (System.Xml.XmlDocument). You can update this in-memory cache and then persist it to disk as an XML file. Internally, the XML Document uses the XmlTextReader. It's the easiest API to use, but the slowest and most inefficient with memory.
- The ability to validate an XML document against a DTD or an XML Schema (System.Xml.Xml ValidatingReader). The Xml-ValidatingReader uses an XmlReader implementation (like the XmlTextReader) to read the XML document.
- An alternative in-memory tree representation of an XML document (System.Xml.XPath.XPath Document). The XPathDocument class is more efficient than the XmlDocument class. Check out Don Box's MSDN TV episode called "Passing XML Data Inside the CLR," which can be found at
Click Here !.
- The ability to navigate through an XML document using XPath (System.Xml.XPath.XPathNavigator). The XPathNavigator can be created over any XML in-memory representation that implements the IXPathNavigable (System.Xml. XPath.IXPathNavigable) interface.
- The ability to apply an XSL transform to an XML document (System.Xml.Xsl.XslTranform).
As you can see, there is a wide variety of options available to the developer when it comes to dealing with XML data. Aaron Skonnard has written a brilliant article, titled ".NET XML Best Practices," (http://support.softartisans.com/kbview.aspx?ID=673).
XML: Soon to Be
Whidbey brings a bunch of exciting enhancements to the party, some of which Mark Fussell discussed in his PDC session, "NET Framework: What's new in System.Xml for Whidbey" (ARC380). Chief among these enhancements are:
- The XmlTextReader has become about twice as fast (in the PDC preview).
- The XPathDocument2 class has replaced the XmlDocument class as the preferred in-memory XML cache.
- The XsltProcessor is the new XSLT processor and is based on the XQuery architecture. It is also faster than XsltTransform.
- Support for XQuery 1.0. XQuery is an XML query language just as SQL is a RDBMS query language.
Reading XML
There are essentially two ways of reading XML. The first is by using the XmlTextReader, which offers a stream-based API. The XmlTextReader in the PDC preview version of Whidbey is twice as fast as the one that ships with the .NET Framework 1.1, according to Fussell. The second way to read XML is to use an in-memory representation of XML. In Whidbey, the XmlPathDocument2 takes from the XmlDocument the mantle of the preferred in-memory representation of XML data.
For our example we will be using the Books.xml file shown in Listing 1.
The code in Listing 2 prints the ISBN numbers and names of all books in the Books.xml file. Here the Books.xml file is loaded into an instance of XPathDocument2 and then traversed using XPathNavigator2.
Although this code is largely similar to what you would write using the .NET Framework 1.1 SDK, one of the differences is that the System.Xml.XPath.XPathDocument has been replaced by the System. Xml.XPathDocument2, and the System.Xml.XPath.XPath Navigator has been replaced by the System.Xml.XPathNavigator2 (notice the change in the namespace in both cases).
Also, the XPathDocument2 provides us with various typed Read methods such as ReadInt32Value, ReadDateTimeValue, and ReadBooleanValue, which are not present in the XPathDocument.
Writing XML
Whidbey introduces a new way to manipulate XML. This method uses the XPathEditor and an XmlWriter (like the XmlTextWriter) to modify an XML document that is cached in memory as an XPathDocument2 object. Listing 3 shows how to load Books.xml into an instance of XPathDocument2, add a new book element, and save the changes.
Transforming XML
Whidbey also introduces a new way to transform XML data - by using the XsltProcessor class. The XsltProcessor class uses an XSLT style sheet to convert an XML source tree to an XML target tree. Its architecture is based on XQuery and performance-wise it's better than the XsltTransform class.
Listing 4 shows a simple XSLT style sheet that transforms Books.xml into an HTML table. Listing 5 shows the code that applies the XSLT style sheet to the XML.
Querying XML
Whidbey has built-in support for XQuery 1.0. XQuery allows you to query XML the same way SQL allows you to query a relational data store. XQuery has a very simple but powerful syntax. It has just five keywords - FOR, LET, WHERE, ORDER BY, and RETURN. To illustrate how to query an XML document loaded in an XPathDocument2 instance, imagine that we want to break the data found in the Books.xml file shown in Listing 1 into two files - Authors.xml (Listing 6) and JustBooks.xml (Listing 7).
We will now try to join the these two XML documents and compose the data in them using XQuery. Listing 8 shows the XQuery that does exactly that. Listing 9shows the code that applies the XQuery in Listing 8 to the documents in Listings 6 and 7.
Conclusion
This article shows how the XML support in the .NET Framework continues to evolve with Whidbey. Microsoft's policy of release early; collect, filter, and apply feedback; and then iterate has elevated the .NET Framework to the position of the best distributed programming platform ever. And a lot of that
credit goes to System.Xml and its friends.
About The Author
Mujtaba Syed is a software architect with Marlabs Inc. He is an MCSD in .NET (early achiever certificate) and an MCAD in .NET (charter member). He has six years of software architecting and developing experience, half of which focused on .NET.
mujtaba@marlabs.com
Listing 1: Books.xml
<Books>
<Book isbn="0735613761">
<Name>Programming Microsoft .NET</Name>
<Author>Jeff Prosise</Author>
</Book>
<Book isbn="0735614229">
<Name>Applied Microsoft .NET Framework Programming</Name>
<Author>Jeffrey Richter</Author>
</Book>
<Book isbn="0321116208">
<Name>Windows Forms Programming in C#</Name>
<Author>Chris Sells</Author>
</Book>
<Book isbn="0201734117">
<Name>Essential .NET, Vol 1: The Common Language Runtime</Name>
<Author>Don Box</Author>
</Book>
</Books>
Listing 2: Using XPathDocument2 and XPathNavigator2 to read Books.xml
XPathDocument2 doc = new XPathDocument2 ();
doc.Load ("Books.xml");
XPathNavigator2 nav = doc.CreateXPathNavigator2 ();
IEnumerable iter = nav.Select ("/Books/Book/Name");
foreach (XPathNavigator2 node in iter)
{
Console.WriteLine (node.ReadValue ());
node.MoveToParent ();
if (node.HasAttributes)
{
node.MoveToFirstAttribute ();
Console.WriteLine ("\t {0}", node.ReadValue ());
}
}
Listing 3: Using XPathEditor and XmlTextWriter to modify an XML document
XPathDocument2 doc = new XPathDocument2 ();
doc.Load ("Books.xml");
XPathEditor editor = doc.CreateXPathEditor ();
editor.MoveToFirstChild ();
using (XmlWriter writer = editor.CreateFirstChild ())
{
writer.WriteStartElement ("Book");
writer.WriteAttributeString ("isbn", "0735620857");
writer.WriteElementString ("Name", "",
"Introducing Longhorn for Developers");
Writer.WriteElementString ("Author", "", "Brent
Rector");
writer.WriteEndElement ();
}
doc.Save ("Books.xml");
Listing 4: Books.xslt
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<table width="100%" border="1">
<tr bgcolor="gray">
<td>ISBN</td>
<td>Name of book</td>
<td>Author's name</td>
</tr>
<xsl:for-each select="Books/Book">
<tr>
<td><xsl:value-of select="@isbn" /></td>
<td><xsl:value-of select="Name" /></td>
<td><xsl:value-of select="Author" /></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Listing 5: Applying Books.xslt to Books.xml using the XsltProcessor
XsltProcessor proc = new XsltProcessor ();
proc.Compile ("Books.xslt");
proc.Execute ("Books.xml", "Books.htm");
Listing 6: Authors.xml
<Authors>
<Author id="1">
<Name>Jeff Prosise</Name>
</Author>
<Author id="2">
<Name>Jeffrey Richter</Name>
</Author>
<Author id="3">
<Name>Chris Sells</Name>
</Author>
<Author id="4">
<Name>Don Box</Name>
</Author>
</Authors>
Listing 7: The new JustBooks.xml – captures authors by their id, not name
<Books>
<Book isbn="0735613761">
<Name>Programming Microsoft .NET</Name>
<AuthorID>1</AuthorID>
</Book>
<Book isbn="0735614229">
<Name>Applied Microsoft .NET Framework Programming</Name>
<AuthorID>2</AuthorID>
</Book>
<Book isbn="0321116208">
<Name>Windows Forms Programming in C#</Name>
<AuthorID>3</AuthorID>
</Book>
<Book isbn="0201734117">
<Name>Essential .NET, Vol 1: The Common Language Runtime</Name>
<AuthorID>4</AuthorID>
</Book>
</Books>
Listing 8: The XQuery that combines the JustBooks.xml and
Authors.xml data (GetAllBooks.txt)
<Books>
{
for $book in document('Books')//Book
return
<Book>
<ISBN> { $book/@isbn } </ISBN>
<Name> { $book/Name/text() } </Name>
{
for $author in document('Authors')//Author
where ($book/AuthorId = $author/@id)
return <Author> { $author/Name/text() } </Author>
}
</Book>
}
</Books>
Listing 9: Applying the XQuery to Authors.xml and JustBooks.xml
System.Xml.XmlDataSourceResolver ds =
new System.Xml.XmlDataSourceResolver ();
ds.Add ("Books", "JustBooks.xml");
ds.Add ("Authors", "Authors.xml");
XQueryProcessor xp = new XQueryProcessor ();
StreamReader r = new StreamReader ("GetAllBooks.txt");
StreamWriter w = new StreamWriter ("Result.xml");
xp.Compile (r);
xp.Execute (ds, w);
All Rights Reserved
Copyright © 2004 SYS-CON Media, Inc.
E-mail:
info@sys-con.com