There's an old rule in software engineering: "Building to scale requires prior intent." Many applications delivered today fail to address scalability; they get deployed fast and sink faster as the load cripples them.
The advent of J2EE 1.3 goes part way toward providing an environment built to scale. The adoption of JMS- and message-driven beans, as a mandatory addition to J2EE, solves part of the puzzle, but the marriage of JMS with JCACHE (JSR 107) makes life much more interesting.
John Bentley, author of Programming Pearls, coined a phrase called the "Ah-Ha!" moment - the moment when you fully understand the dimensions of the problem and, as a consequence, the solution becomes obvious. Those of us who have read the famous Gang of Four book on programming design patterns, Design Patterns: Elements of Reusable Object-Oriented Software, will appreciate the elegance and no doubt say, "Of course, I used that pattern on my last project." The marriage of JMS and JCACHE is an architectural recipe for distributed systems and gives rise to further patterns as we understand the dimensions of the problems we're trying to solve.
In this article we look at how these technologies can be used to provide scalable architectures for delivering J2EE-compliant, Web-based applications today and support what we need to deliver tomorrow by exposing some architectural patterns based on JMS and JCACHE.
A History of Distributed Systems
The history of distributed systems development has its genesis in the days of database management systems. These first systems were centralized monolithic beasts and, soon after, the first commercial implementations of distributed database research and development efforts yielded up much of what we now see as distributed databases complete with replication. These distributed databases reflected the geographic dispersal and management of the information base that defined an enterprise. The techniques employed were synchronous and utilized a simple request/response paradigm for data distribution between database management systems as well as between clients and database servers.
Meanwhile the real-time computing community was also looking at more asynchronous forms of communication. The techniques used in the late 1970s and early '80s were largely event-driven and used communication mechanisms for message exchange in multithreaded and multiprocessor environments. Indeed, much of what was done in those days is reflected in the black art that underpins the building of device drivers.
The building of multitier applications and the rise of the Internet has focused a lot of attention on the techniques for building distributed systems. When Java was born it had remote method invocation built into the language. It supported sockets to enable interprocess and intraprocess communication. While the device drivers for socket handling were asynchronous, the standard mechanism for implementing multitier applications remained rooted in a request/response synchronous paradigm.
In the 10 years that preceded the birth of Java, a lot of research and products came onto the market to define a new space called Message-Oriented Middleware (MOM) for communicating between processes and applications. The early adopters of this new technology were financial service companies such as Goldman Sachs, Lehman Brothers, and Nomura. In fact, many banks were either trying to co-
develop a messaging solution or build it in-house.
Why was MOM so important to the major financial institutions? It enabled them to decouple their systems and provide a flexible solution that reflected their working practices. Furthermore, it enabled them to achieve higher trading throughput by managing business transactions as event flows rather than as database transactions. The fundamental shift to business transactions obviated the need to wait for the database to process the entire life cycle of a trade, and allowed it to be decomposed into a number of database transactions that reflected the natural flow involved in recording, accounting, and settling trades.
Web services are the latest and greatest Internet craze. They support the focus for much of the J2EE 1.3 development (with the JAX pack) and will have a huge influence in what is in J2EE 1.4. Web services are service-centric, not document-centric the way the Internet is used today. Components can be defined as a Web service and so promote reuse. Service definition is through the Web Service Description Language (WSDL) and registration through the UDDI (the repository). Communication is encoded using the Simple Object Access Protocol (SOAP), which is an XML synchronous remote procedure call.
Where does all this history leave us? And where does JMS and JCACHE take us? Well, we can connect applications pretty effectively. We can even reuse legacy-messaging investments through a more open JMS framework approach and so provide standards-based, multiplug asynchronous connectivity. On the other hand, the development of SOAP with JMS gives us a standard synchronous functional protocol over an asynchronous communication mechanism in a Web services context. Now JCACHE provides us with a set of interfaces that, coupled with JMS, provide the bedrock for elaborating standard architectural patterns that are geared to alleviating the problem of server bottlenecks.
The technologies we described thus far fall into two camps: those that are based on a (request/response) synchronous paradigm and those that fall into a (publish/subscribe) asynchronous paradigm. Messaging is generally asynchronous. It has to be to promote fire-and-forget messaging and so support a decoupled solution. But applications are generally built using synchronous mechanisms. The challenge is how to combine these paradigms into an architecture that engenders scalability. To do this we need to understand the problems. In part they're a consequence of history, which is why history is important.
In the next section we examine the problems and present some architectural design patterns that are the basis for building scalable, distributed systems.
The Role of MOM and JMS
The Java Message Service API was born in late 1997, and the first commercial implementation came out in the spring of 1998 from SpiritSoft. Since then a market has grown up around the standard as Fiorano and Progress (now Sonic Software) entered the fray. IBM offered a JMS interface to MQSeries shortly after, and finally Tibco Software joined the club late in 2001.
The adoption of JMS as the first vendor-independent MOM standard is a testament to the power of asynchronous messaging as a fundamental technology for delivering distributed solutions. The move toward message-driven beans and the mandatory status of JMS in J2EE 1.3 is a huge step forward. The ability of many JMS products to offer a scalable load-balanced solution to the distribution, marshaling, and management of requests to an application server supports this. However, it's only the beginning. Life gets much more interesting when you start to apply MOM technology through a JMS standard to caching and so start to understand the basic architectural patterns that it promotes.
What Does JMS Do for Us?
What is JMS and what does it do for us? (For more detailed information check out the following two books: Java Message Service by Richard Monson-Haefel and David A. Chappell, and Professional JMS Programming by Paul Giotta, et al.) The Java Message Service API is an application programming interface. It's defined as a set of interfaces for producing and consuming messages based on publish and subscribe as well as point-to-point queuing models. The JMS specification enables subscribers or receivers to specify a quality of service for messages so they can be reliably delivered or guaranteed to be delivered. It's similar to guaranteed versus registered delivery by the postal service. The JMS specification also describes some semantics that JMS vendors must conform to, such as how a message is received. Thus a queue allows only one receiver to receive a particular message (i.e., it's delivered once and only once), whereas many subscribers to a topic might receive the same message.
JCACHE and the Role of Caching
Although JCACHE (JSR107) is still making its way through the Java Community Process, several vendors have announced or will be announcing caching products based on it. What JCACHE will provide is a standard set of APIs and some semantics that are the basis for most caching behavior. According to its functional requirements, JCACHE "allows applications to share objects across requests, across users, and coordinates the life cycle of the objects across processes."
JCACHE provides a fairly rich set of APIs that let you exercise control over cache loading, cache eviction, and cache validation. It deals with basic caching patterns in which caches can be fed by other caches. A consequence of this is that it allows implementers to take advantage of distribution technology, such as that offered by JMS, as well as any other distribution technology deemed appropriate.
According to JCACHE, a cache is "specific to each process." This seemingly simple statement has profound implications and is one of the reasons why the proposed solution is flexible. Each process can implement its own mechanisms for cache loading and replace or share the necessary functionality as required. When an object is invalidated or updated in one process, the name and associated information can be broadcast to all other instances of the cache; this is where JMS comes to the fore, by acting as the standard distribution interface for a cache. This in turn allows the entire system of processes to stay synchronized without the overhead of centralized control.
The long and the short of it is, JCACHE is a smart approach for read-only data.
Understanding a problem is all about understanding its form or shape. When we fully appreciate the shape of a problem we can find a solution that truly fits. It's the difference between effecting a cure and relieving symptoms. All of us have been pressed for time on some software project; a bug has been holding us up for days and we decide to change this or that based on intuition and the problem goes away. We may have fixed the problem, but since we don't understand the shape we can't be sure. On the other hand, a bug holds us up for days and suddenly, perhaps because we weren't looking too hard, we have further insight and understand the shape of the problem. It's an "Ah-Ha" moment; when we truly understand the nature of a problem so the solution hits us in the face.
The problem with the synchronous and asynchronous paradigms is that enterprises are not totally synchronous or asynchronous. They reflect both paradigms in specific ways that support their business models. The shape is there and the architecture needs to reflect it. In this way we can build systems that reflect and support the business rather than the business being a reflection of the computer systems that support it. A synchronous solution would necessitate all components of an enterprise to participate in some form of distributed transaction - a legacy from history, encompassed by the old mantra that "all solutions start with a database," whereas solutions are bounded by a problem.
Request/response systems are by their very nature pull-based and passive. The applications pull data from the server. Many of these applications manipulate the same logical data numerous times within the same period and across many transactions. As the client population increases, so does the number of requests to the application or DBMS server. The server becomes the bottleneck, swamped with too many spurious requests. Application server, DBMS, and hardware vendors suggest using replication to alleviate the problem and thus reduce the client-to-server ratio. This just relieves the symptoms and does nothing about effecting a cure. As the enterprise grows, so does the need for more replication and more servers, and so the costs escalate.
Web services doesn't fare any better because it's synchronous. It does help deliver systems, getting you there fast, because it offers standard protocols and standard interfaces, but it does nothing to alleviate the inevitable problems of server bottlenecks in large-scale Web services within and between enterprises.
The first pattern is a reflection of how many organizations can be broken down into logical units. In banking this was always based on splitting the enterprise into front-, middle-, and back-office functions. Each functional unit owns its own transactional space. Each transaction space is then connected by a JMS bus. Depending on the semantics of the messages published on the bus, different qualities of service (QoS) may be used. Thus for an order we would want to ensure it always gets delivered, but for a price change we might be willing to accept the last price published (see Figure 1). In this way each functional unit of the business is responsible for its own scalability issues. The use of JMS enables them to proceed at the rate they need to. The only functional unit that dictates overall performance is the order entry system, which is now free to enter orders without worrying about the other functional units of the enterprise. The flow of messages (or events) can then be controlled as a system-to-system workflow and so provide a better controlling and monitoring environment.
The next architectural nuance is to recognize that many systems spend a high percentage of time reading and then having flurries of activity as they transact and data is changed, added, or deleted. During the mid-1990s, when financial services took up the challenge of decoupled architectures, many also used MOM to provide a distribution mechanism for data. It was a natural step to try to cache the data nearest the application that needed it, and further reduce the server bottlenecks.
This next pattern simply recognizes that a typical layer between a consumer of messages from a JMS bus and the bus itself is a read-only cache (see Figure 2). The read-only nature of a cache is important here because the cache is decoupled from the underlying data source. Updating the objects doesn't change the source; updatability is a more advanced pattern. What this pattern requires is a container that understands what we want, how to get it, and how to propagate the changes that it receives. This is the basis for most caching behavior.
Once we take the view that the applications can have a cache of information from the application server or DBMS, and that the cache can be up-to-date or active, then the application no longer needs to pull data that it has already requested. Applications in an active architecture pull once for any given object (or objects), and the server simply pushes thereafter. This seemingly simple change has a major impact on performance and scalability and a major impact on the design of the applications. Applications now need to become event-driven and so, themselves, become active.
A more complex form of caching can be derived from Figure 2. If caches can be connected in a hierarchy with each cache subscribing to a JMS bus (see Figure 3), we can use the natural hierarchical shape of an enterprise and some counterintuitive logic to model a hierarchical caching solution across an n-tier architecture.
In an n -tier architecture, the nearer you are to the data source, the finer the granularity you need to identify the data you want. The further away you are, the more localized the data becomes. Consider this: as all the requests for data converge onto a data source, the data source needs to identify what each requestor needs. But further out toward the edge, each requestor already knows what it needs, be it through object identity or predicate definition. Thus the counterintuitive principle is to have fine-grain subscription closer to the data source and coarse-grain subscription the further away you go from the source. The fact that JMS-based solutions offer topic hierarchies lends itself to this form of traffic shaping and complementary hierarchical caching.
The more the active cache is decoupled from the server, the more scalable the solution becomes. Some systems like HOODINI (a distributed investment banking architecture at Nomura Research Institute) implemented a hybrid (push and pull) approach. In HOODINI, applications registered interest through predicates (active queries) and so enabled the cache to be filled based on a very flexible declarative approach. Although the approach in this case was a hybrid, it has considerable merit as it not only used MOM to distribute change, but also provided a flexible way of describing what you want in your cache through the use of predicates.
Active queries and the use of predicates to define what we want in a cache changes the way in which we build applications. Subscription to a cache and the cache's subscription to a data source are based on predicates. Notification of change is based on specific objects or entities and the results set that they're in or have moved to (see Figure 4).
Decoupled caches don't lend themselves to conventional methods of updating the information. The fact that they're decoupled means that the transaction consistency of the cache is sacrificed. Conflicts that arise from inconsistency need to be dealt with as exceptions in the business flow that underpins an enterprise. This is what many financial institutions do anyway and what transactional islands recognize. What we can do is provide patterns that ensure that the window of opportunity in which these problems may arise is as small as possible. This way we can create an architecture in which exceptions are exceptional and most of the investment in software deals with the bread and butter. So how do we close this window? There are two basic mechanisms.
The first uses the concept of a proxy to provide a synchronous update mechanism. If the business object or entity is a proxy within the cache, and that proxy has all it needs to update the information source, then the proxy can update the information source directly (in standard RMI-type fashion). When the synchronous update call returns, the proxy can publish the update through the cache to other interested parties (see Figure 5).
This pattern is fine for low frequency updates. The synchronous calls to the information source guarantee that all updates are synchronized and so are transactionally safe; however, the price to be paid is the latency of the synchronous call and the attendant server bottleneck. While the read frequency remains high and the update frequency low, this pattern has considerable merit. Indeed this was how the HOODINI cache worked.
The second uses a totally decoupled cache for read and update. When caches are loaded, the information source is no longer the source but is simply treated as a subscriber to change. Changes occur in the network by publishing the changes onto the network, but the information source uses a durable subscription to ensure that it updates the information source (see Figure 6). This carries with it a need to perform conflict resolution when updates to the same object overlap; this can be easily detected by carrying the pre- and postimages of the object, but the conflict resolution is application-defined.
This pattern works for both higher read and high update frequencies, but it does incur the cost of conflict resolution. As long as the number of conflicts is low, then the conflicts can be costed as part of the overall exception handling that systems generally have to perform. If the update profile is segmented so that different parts of an organization update different data, then the pattern is very effective and increasing overall throughput in the face of change.
Raising the Debate
How does this fit into the real world? All the patterns mentioned, from transaction islands to decoupled updatable caches, have relevance to today's Web applications as well as to the Web services of tomorrow. The inability of Web servers to cache information effectively suggests that caching based on a JMS distribution model is highly applicable. The same would be true of a SOAP-based Web service as well as for any internal application in which reading is a large part of what an application is used for.
What JMS has done is raise the debate. We are now able to identify patterns, expose further APIs, and develop the related JMS tools market. It moved the debate beyond JMS in the same way that the introduction of SQL gave rise to database tools in the 1980s.
The Network Is the Database
The adoption of JMS as part of J2EE 1.3 and the widespread use of JMS as a key component in building distributed systems is starting to give rise to a "beyond JMS" debate. What do we need on top of JMS to more effectively deliver systems? What we have argued so far is that JMS and JCACHE go together and that decomposing a business into transactional islands is a natural fit with JMS. We've also pointed out that business transactions are a natural consequence.
Scott McNealy said, many JavaOnes ago, that the network is the computer, and in a sense it's true. We can go further and contend that the network is the database. Certainly when we navigate the Web we're not aware of any single data source. We simply state our criteria and hope to get a match that has meaning. The Web provides a static view of data; caching and JMS- and predicate-based subscriptions enable us to apply the same mechanism for Web navigation to changing data. Not only does the network become the database, but it also becomes an agent for managing change.
In Search of Greater Flexibility
The marriage of JMS and JCACHE and the use of topics and selectors to provide an open standards-based solution to a common architectural problem offer an "Ah-Ha" moment. It allows us to abstract out some of the architectural patterns that support the construction of large-scale distributed systems. For example, the JMS selectors are what enables the messaging and caching to deliver what we need based on a predicate. It's simply the ability to distribute change events, apply a condition to the change, decide what to change in the cache, and then inform an application of the change to the cache. It's a case of managing events, conditions, and then actions and applying it to JMS and JCACHE. Flexibility, the provision of runtime change to such approaches, is what we all seek.
We can encapsulate this flexibility by using event-condition-action rules to externalize those parts of the system that need to be on the surface so they can be changed while the system runs. ECA rules enable this to happen. The ECA rules as espoused by RuleML, also known as reactive rules, can be generative (e.g., compiled into Java) and can be small enough to coexist with a cache. Put them together and you have a flexible dynamic mechanism for active caching based on predicates.
The openness of JCACHE with its policy-driven architecture for cache loading, validation, and replacement can be made totally dynamic by using ECA rules to manage the policy externally to the cache. JMS offers a mechanism to update the cache, providing the data as events to which the ECA rules are applied, and also provides the distribution mechanism for rules to be updated remotely so that caching and rules can be changed wherever they are. In this way it's possible to build robust, scalable, and highly dynamic distributed architectures that can reflect the needs of an enterprise on day one and continue to reflect their needs in the future.
Nigel is director of product management at SpiritSoft
with over 20 years' experience in the industry,
specializing in distributed systems architecture
and audit. [email protected]