Historically, content management systems (CMSs) have been notorious for falling short of enterprise expectations. This is because, despite claims to the contrary, most first-generation CMSs were essentially packaged C or C++ applications originally conceived and designed to solve specific problems, such as publishing departmental Web sites out-of-the-box.
As content management problems proliferated through the enterprise and corporate CIOs looked to deploy their CMSs on a larger scale, these first-generation systems came under increasing pressure to morph from simplistic departmental applications into scalable enterprise infrastructures for content management. However, given their proprietary architectures, first-generation CMSs were poor candidates for enterprise-wide deployment. Weak application program interfaces (APIs) made them incapable of responding to emerging content production needs other than the specific scenarios contemplated by the vendor. Furthermore, these systems had a poor track record in terms of enterprise integration; they were expensive to own and difficult to scale in a cost-effective manner.
Second-generation CMSs now on the market are better suited to the challenge of enterprise-wide deployment. On one level, these systems provide application functionality comparable to that of their predecessors - they support standard Web site publishing and document management functions out-of-the-box. However, these systems are radically different in terms of architecture.
Based on open standards such as J2EE, second-generation content management systems have modular designs and expose core content management functions as rich APIs. In other words, the key characteristic of a second-generation CMS is that it's not merely an application, it's a software platform that can be programmed to support emerging business-critical content management applications not contemplated by the system vendor. Furthermore, second-generation systems provide a rich set of service provider interfaces (SPIs) that allow businesses to easily integrate with enterprise IT infrastructure. The net effects are greater flexibility, extensibility, and ease of integration, which reduce the total cost of ownership.
While second-generation systems represent a significant improvement over first-generation systems, much remains to be done to make content management functionality readily accessible to business users. Consider a content production scenario where Jenny Meyers, vice president of marketing at an online brokerage company, decides to launch a weekly e-mail newsletter called Investor Update that will provide affluent customers with personalized market commentary based on their portfolio composition. The objective of the newsletter is to leverage the company's famed research group. Setting up a closed, first-generation system to support this new process is virtually impossible.
Second-generation systems are programmable; within a few weeks, a team of programmers can develop and test a specialized application to support Investor Update.
But what if, for competitive reasons, the business goal is to launch Investor Update within seven days and there are no programmers to help develop a specialized application? Is a third-generation content management system - one that empowers Jenny to configure the system to suit her content production needs - possible? To generalize the problem, what kind of content management system do business users need to engage in self-service content production? Is self-service content production only applicable within an enterprise? How about content extranets where enterprises can invite their business partners and customers to create custom-content products in self-service mode?
Web services hold the key to self-service content management systems and to the rapid development and deployment of applications such as Investor Update by business users like Jenny.
Self-Service Content Management
A self-service content management system is best designed as a collection of loosely coupled, best-of-breed software components, where each component is an autonomous Web service that performs a specialized content management function and is implemented using a Web services framework such as .NET or J2EE. This article uses J2EE to illustrate how Web services may be implemented.
In the loosely coupled model every specialized component can be a potential provider and/or requester of services. Since the software components can have heterogeneous implementations, they must be able to communicate with each other via platform- and data-neutral protocols. Equally important, business users like Jenny must be able to configure their content production processes by assembling these Web services in the correct order via highly intuitive, wizard-like user interfaces.
Elements of Content Management Web Services
A self-service content management system consists of a core set of building-block services, including the library service, workflow service, transformation service, import and export services, publishing service, and categorization service. As shown in Figure 1, these services must coexist within a Web services ecosystem. Specifically, their WSDL interfaces must be registered with an enterprise-wide UDDI registry. This registry can serve as the "yellow pages" for services and applications that need to interact with these services.
Within a J2EE application server these services can be implemented as stateless session beans. These session beans can be exposed via JAX-RPC as Web services that support synchronous SOAP/XML-RPC interfaces. Figure 2 shows the underlying J2EE architecture of content management Web services.
Library Service
The library service is at the heart of a self-service content management system. It's responsible for managing content - both as native files and as XML - and metadata properties. Every content item must have a content type that describes its semantics. Press releases, FAQs, and product datasheets are examples of content types with specific properties, file formats, and creation templates. For example, a press release can have a property "Release Date," and the author of a press release can create it using a specialized MS Word template or HTML form. In each of these cases XML can serve as the intermediate content representation.
The library service offers services related to content types and their properties, content check-in and checkout, versions and renditions, folder hierarchies, access control, and component-level XML manipulation. In terms of implementation, the library service can use a relational database to store content properties and file servers to store content files. A powerful storage framework can allow multiple, disparate databases and storage servers to be integrated with the library service via service provider interfaces. This is advantageous for scaling the repository as well as interfacing with legacy content stores.
The library service supports at least two kinds of searching capabilities. It provides parametric search on content properties and indexed search on the content itself. Parametric search can be based on SQL and leverage the fact that content properties are maintained in a relational database. Indexed search can be based on conventional search engine technology that maintains an index of XML renditions of all text content in the repository.
Workflow Service
The workflow service is responsible for running active processes and associated tasks that relate to content creation, production, and publishing. It can be used to support simple edit-review-approve workflows as well as complex, collaborative projects. The workflow service supports routing constructs such as task nodes, conditional nodes, parallel split nodes, merge nodes, and independent subprocesses. Workflow instances should be activated from workflow definitions that describe commonly used business processes. Users should be able to create workflow definitions graphically using a visual builder application.
The workflow service is optimized to support collaborative workflows common to content management scenarios, as well as to provide a high degree of flexibility to workflow owners. For example, the workflow service allows users with appropriate permission to modify active workflows midstream, as well as add new tasks and update routing information to reflect a project's changing needs. Such flexibility is critical in content production projects where the precise workflow may not be fully known at the beginning of the project.
The workflow service also assigns tasks and roles to individuals via task lists. It should be possible to automate tasks; they may be performed by scripts or by invocation of external applications. Timers allow overdue tasks to be escalated and notifications to be delivered to appropriate individuals via a separate notification service. Extensive audit trails can be maintained for monitoring and analysis.
Transformation Service
The transformation service specializes in format conversions. As an autonomous service, it allows specific transformation algorithms to be plugged in as transformer objects. It also provides conversion to XML from commonly used text formats, including MS Word, PDF, Excel, and PowerPoint. In addition, the transformation service converts between image formats such as GIF, JPEG, BMP, TIFF, EPS, and SVG. The engine can be invoked at any time with a specific transformation request on a particular content file. On receiving such a request, the engine utilizes the appropriate transformer to perform the request and return the transformed content to the requester. The transformation service is usually invoked by the import, export, and publishing services and by applications to perform file conversions on demand.
Import and Export Services
The import service is responsible for importing content from external systems. It needs to contain a polling engine and specialized importer objects that can periodically monitor files on network drives, FTP sites, and HTTP sites. When new files are detected, they are imported and processed according to the business logic specified inside the importer objects and inserted as new content into the repository via the library service. Typically, the importer objects use the transformation service to convert incoming text files to their equivalent XML renditions. Imported objects are configured using XML descriptors. Other Web services or the J2EE Connector API can be used to develop custom importers for importing content from proprietary enterprise information servers such as ERP systems or legacy databases.
The export service pushes content out to external systems. In addition to acting as a file transfer service, it delivers periodic, incremental updates to destinations by transferring only changed content. In the case of updating a group of destinations that mirror each other, the service can ensure that the updates are part of an atomic transaction. The service may also provide a rollback facility.
Publishing Service
The publishing service is a collection of individual services responsible for repurposing content to a variety of channels. A channel is a logical representation of a publishable entity such as a Web site, print publication, syndication feed, CD-ROM, or wireless broadcast. A channel publisher is usually a Web producer or print editor. Once original content has been created and reviewed, the producer can use it to publish new pages by assigning it to specific presentation templates. Examples of presentation templates include XSL stylesheets, HTML pages with embedded tags, JavaServer Pages, and QuarkXPress page layouts.
An important principle is single-sourcing of content across multiple channels. Multiple Web sites can refer to the same image file, and changes to the file are reflected within every channel either immediately or as soon as the channel is republished. Assigning content to a channel is an important event; it should trigger a set of rules that cause channel-specific transformations to occur via invocations of the transformation service. For example, a rule can indicate that every TIFF image assigned to a Web channel must be transformed into a JPEG image.
Categorization Service
The categorization service classifies content into meaningful categories. For example, a mutual fund prospectus may be classified as belonging to both Equity Fund and Foreign Fund categories. Content may be classified by human experts or by auto-categorization tools, or a combination of the two. A central facet of the categorization service is a taxonomy hierarchy that describes content categories and subcategories available for classification. Industry-specific taxonomy hierarchies are becoming available from industry consortia; these can serve as a basis for content categorization.
Infrastructure Web Services
To function effectively, content management Web services need to interact with infrastructure Web services that serve content management Web services as well as other Web services in the enterprise. Two such infrastructure Web services are the user management and authentication service and the event notification service. These are Web services that act as shared enterprise resources.
User Management and Authentication Service
This Web service handles users and their groups as well as authentication. Users and groups can be created, updated, and deleted and user entitlements can be managed via this service. For example, user Joe Smith may be assigned a role called "Author," which has the right to create content and perform collaborative tasks but doesn't have the right to publish channels. To implement authentication and user management, the service can use a delegation model. For simple usage, the service can provide mechanisms to batch import user account and rights information from external directory and policy servers. For advanced needs, the service supports integration with security realms such as LDAP, Windows NT domains, or proprietary security realms.
Event Notification Service
This Web service is responsible for handling notification events generated by services and delivering them to their intended recipients. This is particularly useful for multicasting actions and events such as content submission, content assignment, and channel publication. The intended recipients of these events may be other services, end users, or system administrators. In the J2EE world, the event notification service can be implemented using the Java Messaging Service (JMS). When notifications must be delivered to human recipients, the JavaMail API can deliver e-mails via an SMTP server. For other forms of delivery, the e-mail delivery system can be hooked to pagers or wireless gateways to deliver notifications to handheld devices and personal digital assistants.
Real-World Application
How do content management Web services, if implemented as described above, benefit Jenny Myers, vice president of marketing at an online brokerage company? How do they help her solve the Investor Update publication problem?
In the new world, Jenny launches a wizard that allows her to set up and configure her Investor Update publication process. The wizard guides her through a series of high-level choices and end-user actions pertinent to the publication of Investor Update. Once she's completed her choices and actions, a script is automatically generated that ties together invocations to the Web services required to support the Investor Update publication process. This script is then saved as "Jenny's Investor Update application." Jenny can invoke it at any time, assign it new content, and automatically generate personalized Investor Updates.
To create the application, the wizard guides Jenny through the following steps. First, she is asked to enter information about the project, such as goals, budget, human resource allocations, timelines, and other project information. Next Jenny has to indicate the channel format - in this case, rich e-mail. Based on her selection, the wizard presents a set of HTML e-mail templates with different layouts and allows Jenny to make a selection. She selects one (she can also initiate a request for a new template to be created based on her specifications). Once she's identified her template of choice, the wizard gives her the flexibility to make simple adjustments to the look and feel of the template. Figure 3 shows an example of a template Jenny can select. She also has the ability to associate content containers within the template with rules that can dynamically extract content from the repository. As the last step, the wizard guides her to select a production workflow to produce Investor Update. Once Jenny's done, she saves her application. Under the covers, the wizard generates a script that stitches together the appropriate invocations to relevant Web services. To verify that the application is indeed working correctly, Jenny can now test the Investor Update application with sample content.
To generate e-mail newsletters each week, Jenny simply launches her Investor Update application. The application displays a list of recipients, and she selects the ones to whom she wants to send the newsletter. Next, Jenny is asked to assign a feature story for the standard feature story spot in Investor Update. She uses the library service to find a feature article. After collecting all the relevant information, the application generates a set of e-mail newsletters, one for each intended recipient. Each Investor Update has a personalized greeting at the top, a standard feature article and graphic in the middle, and three top news items personalized per recipient based on their portfolios. Jenny can inspect these e-mail newsletters and, if satisfied, direct that they be sent. Figure 4 shows the workflow orchestrated by Jenny's Investor Update application as the newsletters are produced. It involves multiple interactions with the user as well as various Web services.
Conclusion
Web services hold the key to self-service content management systems. These systems will enable business users to take control of their content production needs and engage in the rapid development and deployment of applications, without depending on software developers. A self-service content management system is best designed as a cooperating collection of loosely coupled software components. In this model each software component is an autonomous Web service that performs a specialized content management function.
Even though Web services frameworks make it increasingly feasible to develop such content management systems, it is likely that first-generation content management systems conceived as packaged applications will not easily make the transition to self-service content management. Second-generation systems based on open platforms such as J2EE, or that are modular enough to take advantage of .NET, are better positioned to take advantage of the Web services revolution. Among them are the harbingers of self-service content management.
Author Bio
Santanu Paul is the chief technology officer at Openpages, a leading provider of enterprise content production systems. With more than 10 years of industry and R&D experience, Paul is an expert on designing and developing state-of-art content management and workflow systems.
All Rights Reserved
Copyright © 2004 SYS-CON Media, Inc.
E-mail:
info@sys-con.com