Apache Cocoon is one of the most interesting, innovative, and powerful platforms for dynamic content generation, though not as well known as the others. A subproject of the Apache XML project, Cocoon is one of the lesser-known offerings from the folks at the all-open-source Apache Software Foundation, having garnered less attention than some of its more popular cousins like Struts. But Cocoon is worth a look.
It's not just Cocoon's use of XML in content generation that makes it so interesting; it's how it uses XML. Cocoon's authors clearly have a deep experience with and an understanding of XML - what it is and isn't good for - and Cocoon's simple but powerful architecture reflects that experience. XML isn't used here just "because everyone's using it." Rather, Cocoon exploits XML's strength for separating content from presentation. (As we know now, the lack of that separation made it increasingly difficult to do Web page development in straight HTML.) The result is an innovative and powerful tool for content site developers.
This article will familiarize you with Cocoon and some of its related technologies: what it is, what it does, and how to start using it in your own development projects.
A basic understanding of the core concepts behind XML, SAX, and XSL (and, of course, HTML) is helpful when reading this article. Don't worry too much if you haven't worked with these technologies, though. I won't be delving too deeply into them and will try to make any examples easy to understand.
What Is Cocoon?
Although at heart Cocoon is "yet another dynamic content-generation platform" (technically putting it in competition with the many other content-generation technologies out there such ASP, JSP, PHP, and Struts), Cocoon adds some new twists to this category that make it stand out.
Cocoon's core difference is its use of XML throughout the content-generation process. Each request sent to the Cocoon framework is processed using the same three steps:
Along with this simple, easy-to-understand architecture comes great power and flexibility for developers. Although Cocoon is most often used for generating Web pages, it's by no means limited to that. It can generate any type of output you desire for any type of client device you like: HTML, XML, text, WML for WAP-enabled devices such as mobile phones, SVG images, Postscript, Adobe PDF, etc. And, of course, you can "roll your own" add-ons for Cocoon to generate any type of custom output format you need.
- Generate XML content (either statically or dynamically)
- Optionally transform it
- Format it for output
Another major plus of Cocoon's XML-orientation: it provides for excellent separation of content and presentation - that holy grail of software applications. Content is kept as presentation-free XML data for as long as possible during processing, and then formatted into the appropriate output format just before being returned to the user.
In fact, Cocoon strives for an even greater separation of concerns. Its philosophy is to look upon the process of content generation as three separate realms: content, logic, and style. This type of division makes a great deal of sense, especially when you consider that completely different teams of people are frequently assigned to each of these functions: logic to software developers, content to users and data entry staff, and style to graphic designers.
How Cocoon Was Hatched
Cocoon began its life in 1999 as a far less ambitious endeavor than it is now. Pioneered by Apache developer Stefano Mazzocchi, Cocoon was initially just a "proof of concept" - a servlet that used XML and XSLT transformations to generate its output.
By the time Cocoon made it to version 1.0, it had progressed into a full-fledged framework for XML content generation and was starting to receive a good deal of recognition and use by site developers.
As with all early-version software though, it's often difficult to foresee the potential usability problems that will crop up in practice while the software is still being developed. It's also difficult to envision how popular your application will be at such an early stage. Cocoon was no different. Version 1.0, although functional, had its usability hampered by design decisions made early on, most notably its reliance on the memory-intensive XML DOM architecture. (The SAX model, and the APIs and tools needed to use it, were still in their infancy at that point.)
Since Cocoon was proving to be quite popular, demands for new features and improved performance kept coming in. It soon became clear that the initial architecture was not adequate to address these issues.
Enter Cocoon v2.0. This version (released as alpha in March 2001, with the first production release completed in November) seems to be almost a complete rewrite of the application. It addresses the performance issues from version 1 as well as being a much cleaner architecture conceptually.
The first notable improvement is the substitution of the event-driven SAX XML standard for the memory-intensive DOM API. In addition to the improvements in memory efficiency and scalability, the SAX model also allows output to be generated incrementally. This provides a faster response time since a response page is returned little by little, rather than waiting until all processing is complete to return a page (as the DOM model required).
The second major improvement concerns the internal architecture of the Cocoon application. Originally structured using a Reactor design pattern, this structure apparently caused conceptual as well as implementation difficulties. Instead, version 2.0 substitutes a pipeline architecture (described later) that proved far more flexible to code as well as much clearer conceptually.
The result is a solid, well-tested, powerful, and more efficient framework for just about any type of content generation under the sun. At the same time it manages to elegantly achieve true separation of presentation and content.
In short, Cocoon really rocks!
How Does Cocoon Work?
Let's take a closer look at Cocoon and how it works, and see how you can put it to use in your own development.
A Servlet at Heart
Although Cocoon is a powerful framework for XML processing, it's just a servlet at heart. Its job is just like any other servlet's: to receive requests, process them, and then generate a response. Cocoon accomplishes this by taking each request, finding an appropriate "pipeline" to handle it, executing the pipeline, and returning in its response any output that the pipeline generated. The pipeline's function is to generate the response output for a particular request, using XML processing internally to accomplish this task.
The Pipeline Architecture
The pipeline, a simple and elegant paradigm, fits in extremely well with the XML SAX processing model.
A pipeline at its simplest consists of a sequence of the three core Cocoon components - generators, transformers, and serializers - arranged in a chain (see Figure 1). XML data (SAX events) is passed down the chain, with each component performing its own processing on the data as needed. At the end of the chain the events are serialized out to the response's OutputStream and returned to the client making the request.
Generators, Transformers, and Serializers
The first component in the chain is always the generator. The generator's job is to create the stream of XML events that will be fed through the rest of the pipeline. There are prebuilt generators available to create the XML events from a number of possible sources: an XML file on disk, an HTML file (the HTML is tidied up and turned into XHTML in order to be XML-compatible), a JSP page, an XSP page (more about XSP later), etc. In fact, there are over a dozen varieties of generators included with the Cocoon distribution. And you can easily create new ones if you need to generate events from a nonstandard source.
The last component in the chain is always the serializer. The serializer's job is to turn the stream of XML events into some form of output that will be returned in the response. Prebuilt serializers are available to create output in the most popular formats: XML, HTML, text, WML, an SVG image, and more. Again, over a dozen varieties of serializers are included with the Cocoon distribution and again you can easily roll your own to support just about any output format you like.
As an option, a sequence of one or more transformers can lie in between the generator and the serializer. Transformers allow the developer to manipulate the XML events coming down the pipeline - adding, removing, or modifying events as needed - before the serializer finally sends them back in the response.
The XSLT transformer is the most common - and most powerful - transformer. It runs an XSL stylesheet against the stream of XML events coming down the pipeline, allowing the developer to use the powerful XSLT language to transform the XML from pure data into styled output.
You can place multiple transformers in a row in the pipeline, each of which will operate on the XML events one at a time. This allows you to style the data incrementally, and can help keep your stylesheets smaller and simpler.
Although Cocoon uses several other types of components as well (which are beyond the scope of this article), these three components are the core of its architecture. Pretty simple, huh? But it sure is powerful! By assembling combinations of these core components - along with your own custom-built server pages and stylesheets - you can build pipelines to generate content from any data source you like, styled however you like, and rendered in whatever output format you like.
Putting It All Together
Let's look at a sample "Hello World" pipeline and see how this all ties together in practice.
Our "Hello World" pipeline will work as follows:
Let's start by looking at the files mentioned earlier:
- Use the file generator to read XML from a file HelloWorld.xml
- Use the XSLT transformer to run a stylesheet Style.xsl against the XML data and translate it into formatted HTML
- Use the HTML serializer to return the resulting HTML page to the user
<message text="Hello World"/>
The HelloWorld.xml file is very simple, consisting of a single node (<message>) that contains a single attribute (text).
The Style.xsl stylesheet (see Listing 1) is also very simple, consisting of only two formatting transformations. The first one (xsl:template match="/") is called when the XSLT processor begins processing the document. It generates the skeleton of an HTML page. The body of the page is left empty, however, except for the XSL instruction xsl:apply-templates. This instruction simply commands the XSL processor to begin processing any child nodes here, applying other templates as needed. The net effect of the instruction then is "transform any child nodes here."
In this case there's only one node in the XML file, a <message> node, and only one remaining template in the stylesheet (xsl:template match="message"), which is looking to match <message> nodes. Since the stylesheet's template matches the XML file's node, we'll perform the second transformation:
Note the conceptual separation of concerns between the XML file and the XSL stylesheet. The XML file contains data only - the text of a message - with no indication as to how it should be formatted for output. The stylesheet, on the other hand, contains formatting only - instructions on how a text message should be displayed - that can be applied to any XML data containing text messages. As a Cocoon developer, you'll get the most out of the framework if you structure your sites in this fashion.
- Write a <h1> opening tag
- Write the value of the text attribute in the <message> node (in this case "Hello World")
- Write a </h1> closing tag
Once Cocoon executes this pipeline and performs this transformation, the result is the following HTML:
<meta http-equiv="Content..Type" content="text/html; charset=UTF..8">
<title>A Message From Cocoon</title>
This HTML is then serialized using the HTML serializer (which
takes care of the few incompatibilities between strongly formatted
XML and loosely formatted HTML) and the whole thing is sent back in
the response to end the request.
Great! So how do we write the code for this pipeline?
Cocoon uses a file called sitemap to define all the pipelines
in your application. The sitemap is just that, a map of your Cocoon
Web site. It defines which pipeline will be run in response to each
site request, and how exactly each pipeline will generate its
The sitemap is written in, guess what? XML, just like
everything else in Cocoon. Let's look at it piece by piece.
First, all sitemaps must contain the <map:sitemap> root element:
Then the sitemap lists which Cocoon components your site will
use (see Listing 2). In this case we'll be using only three
Each of these components is labeled as being the default
component of its type (more about this later).
- The file generator
- The XSL transformer
- The HTML serializer
We'll also have to define an additional component, a matcher,
to get this sitemap to work. A matcher is used to match the URL that
the user enters and route it to the appropriate pipeline. (We won't
discuss matchers in this article though.)
Then we define the pipelines used in the site. In this case
we have only one, our Hello World pipeline, which we will set up to
be executed when a request arrives for page "HelloWorld.html".
The pipeline calls the file generator to read from the
HelloWorld.xml file, then calls the XSL transformer to apply the
Style.xsl stylesheet, and finally calls the HTML serializer to
properly format the XML event stream as HTML.
Since earlier in the sitemap we defined each of these
components to be the default of its type (see Listing 2), we can use
a shortcut and not explicitly write which component we're using;
Cocoon assumes we're using the default. (However, if we were calling
a generator other than the file generator, a JSP page, for example,
we would need to write something like <map:generate type="jsp"
The full pipeline reads like this:
Finally, we write a closing tag for the root element:
And that's it. Our complete Hello World sitemap reads like Listing 3 .
Installing and Running Cocoon
How Do We Run This Site?
As mentioned earlier, Cocoon is just a servlet at heart. It
can be easily run on any servlet engine that supports version 2.2 or
later of the Servlet API. I've used it with Apache Tomcat, but it can
also be run on WebLogic, Resin, and many others, even on Microsoft
IIS (using ServletExec).
Installing Cocoon onto the server is pretty easy for most
servlet engines and usually consists of the following:
Download one of the Cocoon binaries (e.g.,
cocoon-2.0.1-bin.zip) from http://xml.apache.org/cocoon/dist/.
Unzip the archive.
Grab the cocoon.war file that was extracted from the archive
and copy it to the appropriate directory on your servlet engine
(e.g., for Tomcat, the "web apps" directory).
Restart the servlet engine; it will automatically install
cocoon from the .war file when it's restarted.
More details and instructions for specific servlet engines
can be found on the installation page at the Cocoon Web site:
Once Cocoon has been installed, running it is just a matter
of accessing a URL that's handled by the Cocoon servlet. When a
request to such a URL is made, it is routed to the Cocoon servlet.
Cocoon matches the URL against its sitemap and then executes the
To run our Hello World site, we first need to take the
sitemap we just wrote and overwrite the sample sitemap.xmap file that
Cocoon provides us by default. Then we just point our browser to
http://localhost:8080/cocoon/HelloWorld.html and - voilà
Cocoon serves up our dynamically generated "Hello World" page.
The Power of Cocoon
Our "Hello World" pipeline is an extremely simple example.
However, it's not hard to see how applying these concepts can enable
us to create more complex sites with Cocoon.
Since Cocoon provides complete separation of content from
style, you can take the same content and format it in many different
ways. There's no need to create new logic or content in order to
create different looks for your site. Just create a new stylesheet
for each output format, and you can serve up completely
different-looking sites from the same content.
How could this be useful in practice? Imagine the following
possibilities, all of which can be accomplished with ease using
Cocoon. You could create sites that serve out the same content
formatted completely differently based on:
Various devices that clients use to access the site (a PC
with a Web browser, a WML-enabled phone, a PDA using the HandWeb
Different browsers that clients use to access the site (e.g.,
display the site differently if the user is using Netscape, IE, or
Opera; provide special support for old versions of these browsers;
take advantage of the advanced features of newer versions of these
Machine-readability: the site can generate both a
user-readable version (in HTML) and a machine-readable version (in
The user's language (you could internationalize your site
Different security levels
Clearly, Cocoon's ability to do dynamically styled page
generation is a powerful tool for site designers!
Another key innovation to come out of the Cocoon project,
which I mentioned briefly above, is Extensible Server Pages (XSP).
Inspired by JSP, XSP provides all the power of JSP while removing one
of that technology's major drawbacks: the intermingling of content
As discussed earlier, Cocoon heavily stresses the separation
of content, logic, and presentation. If there's one place that logic
and presentation are often intermingled it's in server pages. By
definition, both ASP and JSP freely intermix logic and presentation,
i.e., source code and HTML. Although the use of beans and taglibs in
JSP can minimize this to some extent, there's still inherently some
intermingling of logic and presentation, due to the use of HTML.
Cocoon's solution is, once again, elegant: use XML instead of
HTML in your server pages. Unlike HTML, XML is presentation-free;
it's just data. So writing a server page using XML makes a lot more
An XSP page therefore consists of XML data tags, along with
intermingled logic (Java code). As with JSP, the Java logic (through
the use of either embedded code or calls to external modules)
dynamically creates the page to be output. The difference here is
that, once again, presentation-free XML is what the logic will
generate, not HTML.
All XSP pages are Cocoon generators - the source of XML
events in a pipeline. Once the XSP page has executed and generated
the appropriate XML stream, the stream is then typically styled and
formatted using a Cocoon transformer (e.g., the XSLT transformer
using an XSL stylesheet) into the appropriate output format (such as
Like JSP, XSP pages are compiled into Java code (and then
eventually class files), and like JSP, XSP also provides support for
tag libraries (often referred to as "logicsheets" in Cocoon). As JSP
developers know, calling reusable tag libraries in your pages helps
to keep them from becoming too filled up with Java code. Using tag
libraries with XSP provides the same benefits.
XSP is too big a topic to discuss in more detail here. (It
could easily fill up an entire article on its own.) This should be a
good overview though, and you can refer to the Resources section if
you'd like to read more about XSP.
A Cocoon Case Study
I recently used Cocoon to develop a site for a New York City
law firm. The project, its design, and some of my reasons for
choosing Cocoon are described here to help provide some insight as to
when you might want to choose Cocoon as a development platform on
your own projects.
The law firm was looking for a new piece of software to
replace the ancient and inflexible software they were currently using
and getting increasingly locked into (Microsoft Works...for DOS!). The
system functionality was not terribly complex - a basic CRUD system
(functionality for create, read, update, and delete) that would
provide a user-friendly front end for the legal case files in their
database. The entire application would consist of less than a dozen
Although the head attorney was fairly computer-savvy (he had
recently mocked up a prototype for the new system in MS Access), he
was looking to me for technology recommendations and was happy to
defer to my knowledge and experience.
My first recommendation to him was to choose a Web-based
system over an application in MS Access. This was an easy decision
for me, as there would be several benefits to be gained from a
Web-based system, including ease of development, minimal training
required for the rest of the staff, and no software installation or
upgrade procedures needed.
But which Web development platform to choose was a bigger
question. Off the top of my head JSP and Struts were the leading
candidates, but I also wanted to consider some newer, more
cutting-edge technologies as well. As I had heard of Cocoon before,
and had worked heavily with XML on a recent project, I started
reading up on Cocoon to see if it would be a good fit.
Cocoon's technology was intriguing, and my experience with
XML enabled me to come up with an idea that I realized could save a
good bit of development time. Since the screens were fairly simple
and similar (just rows of fields from the database) I realized that I
could design the GUI extremely quickly and easily by just mocking up
the screens in XML (see Listing 4 , a mock-up of a Web page in XML
that will later be rendered using HTML tables).
Then switching hats and putting myself in "style" mode, I
could turn all the screen mock-ups into Web pages just by writing a
single stylesheet that would transform the XML into HTML. Each <page>
tag could be transformed into a skeleton for an HTML page, each
<section> tag could be transformed into an HTML table, and each <row>
tag could become a row in the table. The idea appealed to me.
What finally clinched the decision to use Cocoon, however,
was an additional requirement, one that I initially wasn't sure how
to accomplish. Law firms, as we all know, generate reams of
documents, and one of the reasons this firm had stuck with MS Works
all these years was its ability to mail merge the information from
the database into a document. If they were going to abandon Works,
the new system would need to provide mail-merge functionality as
well. At first, I had no idea how I would provide a mail merge in a
Web-based system. But as I thought it through I began to formulate a
First of all, I realized the best approach would be for the
site to serve up the merged documents as a download from a Web page.
This would be simple for the users. Most browsers' "open-attachment"
functionality is automatically configured to launch the appropriate
application when an attachment is opened, so each time the user
generated a mail-merge document, the word processor (MS Word) would
automatically launch and open to that document. This would work out
But how to do the mail merge itself? Although MS Word has a
mail-merge utility, I felt it would be clumsy for the users. It would
be much simpler for them if they could just click on a button labeled
"create merged document" and have the document arrive with all the
merge substitutions already done behind the scenes. What would be
required to make this happen?
Having the site retrieve the appropriate data from the
database was certainly easy enough. After that, I reasoned, the
template document would need to be read in, the merge fields
identified, and the actual data substituted in.
Reading an MS Word document was a tall order - I wasn't aware
of any Java libraries that could do that. But what about an RTF (Rich
Text Format) file? RTF, unlike the Word .doc format, was text-based
and would be much easier to read. In fact, I could probably write a
parser to do it. I read though the RTF spec and after spending a
couple of hours with the JavaCC parser generator, I was able to
successfully read RTF documents and find the mail-merge fields in
them. I checked with the head attorney to make sure he didn't mind
using RTF format instead of Word, and he didn't. As long as they
could still use the MS Word application to edit the documents (which
they could), he was fine with it.
That left me with the last bit: How to substitute the data
retrieved from the database for the merge fields? Boy, I thought, it
would be nice if there was some existing code that could already do
this so I wouldn't have to write it from scratch. What type of
software could I use to scan through a document for a particular
piece of content and change the value of that content before
outputting it again? Then it hit me: I could use an XSL stylesheet!
XSL was built to easily handle tasks like this.
Suddenly the whole idea began to come together, and my
decision to use Cocoon was clinched. I would write a new generator
that would parse an RTF file and turn it into a stream of XML events.
I would generate a stylesheet in response to each mail-merge request
that substituted database values for the merge fields and I'd use the
XSL transformer to apply this stylesheet. Then I would use the text
serializer to write out the new mail-merged RTF file data, I'd set
the appropriate MIME type for RTF documents ("application/rtf"), and
the user would get served up a mail-merged RTF document. I put
together another proof-of-concept and, sure enough, it worked.
Writing the application using Cocoon worked out well, though
it did have its share of challenges. As the application was not
particularly complex, the code was not particularly difficult. The
biggest challenge, however, was getting up to speed in some of the
new technologies I was using, primarily Cocoon and XSL. I wound up
learning them incrementally, as needed, when I hit roadblocks in
various pieces of the development. ("Hmmm. How do I do this in XSL?")
The online documentation I found for XSL and Cocoon was helpful, as
was Michael Kay's book, XSLT Programmer's Reference 2nd Edition.
(And, of course, I also posted my share of "Help me!" messages to the
Cocoon Users mailing list.)
The system was finally completed and installed in December
2001, and got a big thumbs-up from both the users and the head
attorney. It is now used daily by the entire staff.
From my perspective, I give Cocoon a big thumbs-up. I chose
it as the core technology for this project, and it accomplished
everything I needed. I found that using XML and Cocoon on the project
allowed me to deliver it faster, as well as helping out tremendously
in the conception and design phase. While designing I was able to
focus completely on what type of content I was going to display on
each page and how it would be generated, and completely ignore all
presentation and style concerns until a later time. I found this
separation of concerns during design to be quite a refreshing change!
Although I found Cocoon to be an excellent technology for
site development, it's not without some drawbacks. You should be
aware of the following concerns when you're deciding whether to use
Cocoon in your development:
Cocoon is still a fairly new and less established technology.
And there's still a risk that it might not catch on and achieve a
large user base. This translates into several direct risks for
systems built with Cocoon:
- It's still more difficult to find developers experienced in
Cocoon and its related technologies than in more mainstream
technologies like JSP or Struts.
- If it doesn't catch on, it may be difficult to continue
enhancing and supporting applications built with it.
Cocoon, like many new technologies, is still undergoing
development. Although most of the major redesign work appears to be
behind it, the functionality and APIs may continue to change in the
future, forcing developers to update their applications to be
compatible with new releases.
As with many technologies, making life easier for the
developer often comes hand-in-hand with a performance penalty. As we
know, even our choice of using the Java language comes with a
performance hit. But if the hit is minimal enough and the efficiency
gains significant enough, the performance issue is outweighed. Cocoon
is no different - its power comes at a price. Although many Cocoon
components are eventually compiled down to class files (allowing
optimizers like JITs and HotSpot to do their magic) all that XML and
XSL processing does take a performance toll. (Cocoon does utilize
techniques to make this processing as efficient as possible, such as
an extensive caching mechanism.) You need to decide for yourself if
it's worth it. For a law firm's document processing system I felt
that the answer was yes. For building a high-volume site where
performance is critical, like Amazon.com or a stock-trading system,
your answer might very well be no.
The sitemap/pipeline examples I gave here were very simple
and fit cleanly into Cocoon's generator-to-transformer-to-serializer
paradigm. But many sites require more complex functionality than that
to respond to a request, such as conditional processing, loops, etc.
The sitemap does have other capabilities that can accommodate these
needs. (In fact, the sitemap is beginning to resemble a full-fledged
program itself.) As a result, it can sometimes be a bit challenging
for developers to make the sitemap do what they need, and its
simplicity and readability can be sacrificed somewhat. Cocoon's
developers are aware of this problem, however, and working to address
it. They are now deciding how best to redesign the sitemap to handle
these issues, and various ideas about the redesign have been
discussed on the Cocoon Developers mailing list in recent months. As
with all open source software, ideas from the development community
For More Info
We've only scratched the surface of the Cocoon technology
here. I've omitted a great deal of material for brevity's sake.
There's much more to Cocoon, and a detailed discussion could easily
fill a book. (Indeed, the Cocoon Developer's Handbook, by Sue
Spielman, is due out this year.)
If you're intrigued by what you've read here, I'd encourage
you to start using Cocoon. There's no better way to learn than by
hands-on development. The best way to approach Cocoon is to start
with a small, simple site and build it up incrementally from there,
learning as you go.
There's also loads of additional documentation available
about Cocoon online, in fact, probably too much for a novice user.
Again, I'd encourage you to approach it incrementally. Read a little
at a time, learning more about each of the various components and
techniques as you need them.
Understanding Apache Cocoon:
Cocoon 2 Idiots Guide:
How to develop Web applications with Cocoon:
Cocoon Users mailing list: cocoon-users-subscribe
@xml.apache.org. (Details about acceptable content for the list at
Cocoon Users mailing list archives:
At the Cocoon Web Site
Spielman, S. (June 2002). Cocoon Developer's Handbook. Sams:
David Rosenstrauch, a software developer, has been providing development and
consulting services to Fortune 500 companies and start-ups for over 13 years.
<title>A Message From Cocoon</title>
<map:serializer name="html" mime-type="text/html"
<map:serializer name="html" mime-type="text/html"