J2EE applications are characterized by the continuous creation, consumption, and destruction of various types of application objects.
These objects may be product objects in e-commerce applications, session objects, or user profile objects, to name a few common examples. Creation and destruction of these objects is expensive - object creation usually requires accessing persistent storage in back-end systems (e.g., DBMSs and file systems), while object destruction requires releasing resources used by the object (e.g., memory, database connections, etc.).
A very popular solution to address the costs of object creation and destruction is to store these objects in the application process memory, often referred to as in-process caching. One of the greatest advantages of in-process caching is that it provides fast access to application objects, which can improve application performance. Unfortunately, there are drawbacks associated with in-process caching. By far, the greatest drawback is its impact on garbage collection (GC) overhead in the application process where the cache resides. In certain situations, in-process caching can significantly increase the costs of GC, resulting in severely increased CPU utilization and response times.
In this article, I demonstrate that in-process caching can be harmful to application performance and scale in memory-constrained application environments due to its adverse impact on GC performance. I then discuss an alternative caching approach, an external caching architecture, and demonstrate that it can provide significant performance benefits for such applications when compared to in-process caching. Before delving into the details, I first provide an overview of two key concepts that are central to understanding this article: object caching and Java memory management.
Object Caching Basics
Object caching refers to storing an object that has been generated for a particular request so it can be used to serve subsequent requests for the object. A commonly used type of object caching is in-process caching, in which objects are stored in the application process memory. By keeping objects in local memory, subsequent requests for the object can be satisfied directly from memory, reducing the overhead of object creation and destruction.
Figure 1 depicts an in-process caching architecture. Nearly all enterprise software systems (whether coded from scratch or running packaged applications) have multiple caches that run inside each application process. Most J2EE application servers offer in-process caching features. In fact, in-process caching has become so widely used that it is considered the de facto standard approach for optimizing application performance and scale. JCACHE (JSR #107), which proposes a standard Java API for in-process caching, is evidence of this.
Overview of Java Memory Management
This article is concerned with two key concepts in Java memory management: garbage collection and reference objects.
Garbage collection is the method by which memory is automatically reclaimed from unused objects. An object is eligible for GC when it can no longer be reached from any pointer in the running program. Most JVMs use a generational collection model, which takes advantage of the fact that, in most programs, the vast majority of objects are very short-lived (e.g., temporary data structures). In a two-generational collection scheme (as used in the HotSpot and JRockit JVMs), a young generation is maintained for short-lived objects and an old generation for long-lived objects. When the young generation fills up, a minor collection is invoked. When the old generation fills up, a major collection is invoked.
Since major collections require iterating over all living objects, they take orders of magnitude longer to complete than minor collections. This point results in two key conclusions regarding GC performance:
1. The longer an object survives, the more collections it will endure and thus, the slower GC becomes.
2. By arranging for most objects to be collected via minor collections, GC can be very efficient.
A reference object encapsulates a reference to some other object so that the reference itself may be examined and manipulated like any other object. Two types of reference objects, soft and hard references, are of interest here, since they are often used for in-process caching. A hard reference object is an application object that can be reached directly, i.e., without traversing any reference objects, by a pointer in the running program. A soft reference holds a reference to one or more objects - the objects referred to are called soft reference objects. Figure 2 depicts these two types of reference objects in application memory.
A unique characteristic of soft reference objects is that they can be reclaimed by the garbage collector if additional application memory is needed. For this reason, soft references are typically used to cache objects that are "not critical" to the application. As an example, consider a product object in an e-commerce application. If the product object is not available in memory when requested, the cost to the application is the cost to re-create the object from scratch. Hard references, on the other hand, are usually used to cache "critical" objects, i.e., objects whose loss would render the application unresponsive (or, at best, severely delayed) to requests. An example of such a critical object would be a session object.
Impact of In-Process Caching on GC Performance
A set of tests was conducted to examine the impact of in-process caching on GC performance. Both soft and hard references are examined in these tests.
The test application is a Java-based shopping portal that maintains a user profile for each user in order to serve personalized content. Three test cases are considered initially:
1. No Cache: This is the baseline case, in which no caching is used. A request is served by instantiating the profile object (requiring a database access), generating the requested page, and then writing the updated profile information to the database.
2. Soft Reference: The profile object is kept in memory for the duration of the session, or until the object is evicted from the cache due to the replacement policy, whichever occurs first. The cache is implemented using soft references.
3. Hard Reference: Same as the Soft Reference case, except that the cache is implemented using hard references.
The basic test configuration consists of an application server, a profile database, and a cluster of clients, similar to the architecture shown in Figure 1. The application server is WebLogic, the profile database is Oracle, and the client load simulation software is LoadRunner. Detailed hardware and software specifications are shown in Table 1. All modules reside on the same local area network and communicate via sockets.
The key test parameters include the average size of a profile object (100KB), JVM memory size (64MB), and total available memory size (51.5MB). A complete list of test parameters is displayed in Table 2. In the two caching cases, the cache is initially empty. Measurements are recorded once the cache is 50% full to ensure that the system is in steady state. The system load of 275 requests per second is a sufficient load to create resource contention for the experiments.
Figure 3 shows CPU utilization versus time (the curve labeled External will be discussed subsequently). In the No Cache case (see Figure 3A), the CPU utilization remains fairly constant at about 70%, with several peaks at various points. These peaks represent the points when the GC process runs. The smaller peaks (reaching about 75%) represent minor collections, while the taller peaks (reaching about 90%) represent major collections.
The system behavior is quite different when in-process caching is used. In the Soft Reference case (see Figure 3B), the CPU utilization is relatively low initially, around 45%, with periodic peaks representing minor and major collections. At time 350 seconds, a sudden spike in CPU utilization occurs. This spike indicates the point at which the system reclaims memory from the soft reference objects, since it's unable to reclaim sufficient memory through minor or major collections. This reclamation empties the cache. As a result, each subsequent request requires that the profile object be generated. This work, along with the corresponding cache operations (e.g., insertion, lookup), translates into the dramatic increase in CPU utilization shown. As the cache fills again, CPU utilization decreases.
In the Hard Reference case (see Figure 3C), the behavior is initially similar to the Soft Reference case. However, in the Hard Reference case, the system is unable to collect the needed memory, and thus, a system crash ultimately results.
Figure 4 shows the average response times, which mirror the CPU utilization results. The response time for the No Cache case remains fairly constant at about 145 milliseconds (ms). For the Soft Reference case, the response time is initially at 70ms and decreases to about 10ms as the cache fills. The response time then jumps to about 200ms, indicating the point when the soft reference objects are reclaimed. The Hard Reference case follows a similar pattern, except that the response time continues to increase until the system crashes.
A comparison of the three cases provides some interesting observations. Table 3 provides a comparison of the in-process caching cases with the No Cache case for selected experimental ranges. The Soft Reference case can provide up to a 36% reduction in CPU utilization and up to a 14x reduction in response times when compared to the No Cache case. However, these improvements are only possible when the system has sufficient memory so that soft reference objects don't need to be reclaimed (e.g., during time 0-349 seconds). When soft reference objects need to be collected, the Soft Reference case actually degrades performance - up to a 43% increase in CPU utilization and a 38% increase in response times (e.g., during time 350-680 seconds).
The use of in-process caching increases the frequency of major collections. With the Soft Reference case, major collections occur about three times as often as in the No Cache case (about once every 30 seconds for the Soft Reference case and about once every 100 seconds for the No Cache case).
The Hard Reference case provides similar improvements as the Soft Reference case as long as sufficient memory is available. However, once memory becomes scarce, the Hard Reference case also degrades performance when compared to the No Cache case - up to a 43% increase in CPU utilization and a 79% increase in response times (e.g., after time 350 seconds). Furthermore, in the Hard Reference case, major collections occur about four times as often as in the No Cache case.
As these results indicate, in-process caching can be an effective solution, provided that the system has sufficient memory available. In memory-constrained application environments, however, in-process caching can actually be detrimental to application performance. The primary reason for this is the impact of in-process caching on GC performance. It impacts GC performance in two important ways:
1. Increases the frequency of GC: Consider an application that has M bytes of memory available and consumes m bytes of memory per second when processing a certain number of requests. Suppose that when the available process memory falls below N bytes, the GC process runs. Then the frequency of GC is (M-N)/m seconds. Now suppose that this application uses in-process caching, with C bytes of memory allocated to the cache. In this case, GC frequency will be (M-N-C)/m seconds. Thus, the use of application memory for caching decreases effective available memory, which, in turn, causes GC to run more frequently.
2. Increases the frequency of major collections: Due to the high cost of major collections, it's preferable to have as few major collections as possible. However, it turns out that a major cost of in-process caching is a marked increase in the frequency of major collections. Objects remain in cache until they are evicted by either a replacement policy or an invalidation policy, at which time they are marked eligible for GC. Since cached objects tend to be long-lived, once evicted, they must be collected via a major collection. As a result, major collections occur more often.
To summarize, an in-process caching system can be an effective optimization solution for applications having sufficient memory available. However, in memory-constrained application environments, the in-process caching approach significantly increases GC costs. This, in turn, causes significant increases in application costs in terms of CPU utilization and response times.
Table 4 summarizes the advantages and disadvantages of in-process caching. Thus, the question that arises is whether it's possible to design a caching architecture that can still provide some of the advantages of in-process caching, but without the disadvantages.
An External Object Storage Solution
An external, centralized caching model for application objects is emerging as an alternative approach for optimizing applications. For example, the external servlet containers (SCs) architectural approach was described in a recent JDJ article by Mikhail Skorik ("External SC Architecture and VO Cache," [Vol. 7, issue 10]). Figure 5 depicts an external application object caching architecture, in which the cache is maintained and managed by a separate dedicated process, referred to as the cache server. The cache server is a single logical cache instance that can be shared by multiple application processes. Each application process communicates with the cache server through a lightweight communication library (shown as the cache client module in Figure 5) that is integrated with the application.
At runtime, this system works as follows. When an application creates an object instance that needs to be cached, it uses the communication library to serialize the object and store it in the cache server. When a subsequent request arrives for the same object instance, the application (through the communication library), checks the cache server for the object instance. If it's found, then it's served from the cache. Otherwise, the application creates the object instance and stores it in the cache server. All cache management functionalities, such as invalidation and cache replacement, are handled by the cache server.
To compare the performance of the external and in-process caching architectures, the same experiment described previously was run using Chutney Technologies' Apptimizer to store the profile objects externally. The Apptimizer runs on a separate Windows 2000 machine having a single 900MHz processor and 256MB RAM.
Figure 3D shows the CPU utilization for the External case. The CPU utilization is initially about 65%, and decreases over time until it reaches about 55%, where it remains for the remainder of the experiment. The response time results follow the same general pattern as the CPU utilization results. As shown in Figure 4, response time for the External cache is initially around 70ms. As the cache fills, less time is spent creating the profile objects, causing response time to decrease to about 40ms. At this point, the cache is full, so response time remains constant for the remainder of the experiment.
Table 5 provides a comparison of the External case with the other three cases for selected experimental ranges. With respect to the No Cache case, External caching provides up to a 21% reduction in CPU utilization, up to a 71% reduction in response times, and results in no change in the frequency of major collections. These effects hold over the entire experimental range.
When compared to the in-process caching cases, the External case degrades performance initially (e.g., during time 0-349 seconds) in terms of CPU utilization (up to a 30% increase) and response times (up to a 5x increase). However, most of the time, the External case outperforms the in-process caching cases, providing up to a 45% reduction in CPU utilization and an 80% reduction in response times. This is somewhat surprising, given that the External case requires a network call for each access to the cache, whereas the in-process caching cases require only an in-memory lookup.
The improved performance of the External cache is due primarily to the fact that the External cache does not consume application memory for caching. Since the JVM memory utilization is low, the frequency of GC - and of major collections in particular - is quite low. Notably, the External case provides a 70% reduction in major collection frequency when compared to the Soft Reference case and a 76% reduction when compared to the Hard Reference case. The end result is that the External case spends less CPU time in GC and, therefore, spends more CPU time in application processing.
To summarize, the external caching architecture has several advantages over the in-process caching architecture, as shown in Table 6. By storing objects out-of-process, external caching allows application memory to be dedicated for application processing. As a result, GC is invoked less frequently, reducing the CPU time spent in GC for the application. Moreover, the number of long-lived objects in application memory is reduced, which reduces major collection frequency.
There are also disadvantages associated with external caching (see Table 6). Perhaps the greatest challenge in building an effective external caching system is that it requires an interprocess call (and perhaps a network call) between the cache server and the application process for each access to the cache server. Commercial solutions, such as main-memory databases (e.g., TimesTen), object databases (e.g., Excelon's ObjectStore), and other object storage solutions (e.g., Chutney's Apptimizer), are designed to meet the stringent throughput and response time requirements for an external caching architecture.
In this article, I have shown that the widely used in-process caching approach is not always the ideal optimization solution for enterprise applications. For applications having modest memory requirements, an in-process caching system may be an effective optimization solution. Many enterprise applications, however, do not have such modest memory requirements. For these types of applications, in-process caching can significantly increase GC costs, which, in turn, causes significant increases in application costs in terms of CPU utilization and response times.
An alternative optimization solution that deserves consideration is an external caching architecture. An external cache can increase the effective memory available to the application, thereby reducing the frequency of GC and of major collections. This reduction in GC costs can lead to significant improvements in application performance in terms of CPU utilization and response times over an in-process caching solution.
References BEA Systems WebLogic JRockit:
Chutney Technologies, Chutney Apptimizer:
Excelon Corp. ObjectStore: www.exln.com/products/objectstore/
JSR #107: JCACHE - Java Temporary Caching API:
Pawlan, M. (1998). Reference Objects and Garbage Collection:
Sun Microsystems. "Tuning Garbage Collection with the 1.3.1 Java Virtual Machine":
Sun Microsystems. Java HotSpot Technology:
Helen Thomas is a cofounder of Chutney Technologies and a technical expert in the area of decision-support databases. She is also an assistant professor of information systems at Carnegie Mellon University. Helen received her PhD in information technology management from Georgia Tech, an MSE in operations research/industrial engineering from the University of Texas at Austin, and a BS in decision and information sciences from the University of Maryland at College Park.