Java brought garbage collection to mainstream programming. Never before have commercial software developers been so aware of the need and benefit of using a collector. Notwithstanding, the benefits of garbage collection in Java are far from being completely realized. As larger and more complex applications are built in Java, it's becoming apparent that some very flexible memory management schemes are both needed and possible.
In this article I explore a cache you can build on JDK 1.2 beta 3 that uses WeakReferences to cooperate with Java's garbage collector. The cache uses WeakReferences to take advantage of available system memory without clogging it with unfreeable objects. When memory is reclaimed, the cache will free those objects that are not in use, resulting in a better-performing application. Objects can be cached in memory but don't have to trigger memory thrashing.
References and WeakReferences
Before talking about the cache in detail, we need to explore JDK 1.2's Reference objects. Introduced in the java.lang.ref package, the Reference class stands as the fulcrum for a body of memory management classes in JDK 1.2. The intention behind these classes is to allow an application developer to interact with the memory management policies of Java in a type-safe and extensible way.
The basic idea is to put data inside a java.lang.ref.Reference class (or subclass) instead of referring to the data directly through a variable. The data can be retrieved as needed from the Reference class, but the application doesn't keep a permanent finger on the data. Under certain conditions, if the garbage collector determines that only the Reference object is currently using the data, it may free the data. This can happen even if the application is still using the Reference object that wraps the data.
To illustrate, consider the following code:
Reference myReference = new WeakReference( new String("some string") );
In this example the application has created a WeakReference, a subclass of the base Reference class. Inside the WeakReference is stored the string "some string." The application refers to the WeakReference through the variable myReference, but has no direct access to the string.
Whenever the string is needed, the application calls the get() method on the Reference object as follows:
String myString = (String) myReference.get(); // returns "some string"
The application can then use the string. For References to do their job, however, the application needs to release its lock on the string object by either exiting the scope where "myString" exists or explicitly setting the variable to null:
myString = null;
When memory gets tight, the garbage collector may determine that the only reference to the string "some string" is through the Reference object "myReference". When this occurs, the string may be freed even though the Reference object that has stored the string is still in use. This is in stark contrast to the way memory management happens for any other class in Java, where transitive references are sufficient to keep an object from being freed. Reference objects are special, and are specifically handled by the garbage collector.
The garbage collector doesn't free a Reference's interior object directly, but instead invokes the clear() method on the Reference. Invoking this method is the signal to the Reference that its interior data will be freed. Once cleared, the Reference object will return null from the get() method. An application can detect that the interior data has been freed as seen here:
String myString = (String)myReference.get();
if ( myString == null )
// do something since
// the interior data was freed
It's worth noting some awkwardness in the Reference terminology. The Java language has a language construct, called a reference, that should not be confused with the Reference class. The language construct is the way a variable "refers" to its data, while the class is a first-order entity in the system. The Reference class is used to "wrap up" and manipulate the concept of a language reference, a process known as reification. The interior data managed by the Reference class is called the referent of the class, meaning the thing to which the Reference class refers.
The specific conditions under which a Reference class is cleared vary from one subclass of Reference to another. Some subclasses are silently cleared while others are not, allowing an application to take action. The basic idea, however, stays the same.
References and ReferenceQueues
Another aspect of the Reference class hierarchy is the use of ReferenceQueues for monitoring state transitions. By registering a Reference with a ReferenceQueue, an application indicates that the Reference should be put into the queue when a significant state transition occurs. The application then pulls the References out of the queue. What constitutes a "significant state transition" varies from one subclass of Reference to another, but is commonly defined as clearing the Reference. That is, after the Reference is cleared, it is put into the queue.
The ReferenceQueue itself is monitored in two different ways. An application can poll for items in the queue or can block waiting for something to enter the queue. The latter is particularly useful in multithreaded applications when there are auxiliary system resources associated with Reference objects. An application dedicates a thread to monitoring the ReferenceQueue; when an object is placed into the queue, it frees up whatever resources are associated with the reference.
Before designing our Reference-based cache, let's put down a few requirements. In a typical cache, arbitrary objects are stored, identified by a unique key value and retrieved later. Each key/value pair is a cache entry. The entries are kept in memory to improve performance, but there is no requirement that any given object be in the cache.
To minimize memory consumption, a simple cache will limit the number of entries it can hold. A sophisticated cache will take advantage of available memory by using a variable number of cache entries. When memory gets tight, items are not in use or memory is being reclaimed, the sophisticated cache releases some (or all) of its cache items. In this way the cache can use available memory without creating sandbars that the memory manager has to work around. Our cache will release items when they are not in use and the garbage collector is reclaiming memory.
We will base our cache on JDK 1.2's collection classes. Our cache will implement the java.util.Map interface. which offers a simple put()/get() interface. An object is added via put() and retrieved via get(). We don't want the user of the cache to see any use of Reference objects, so we define the put/get methods to accept the cached objects directly.
Listing 1 shows the skeletal implementation of the cache. This simple implementation will store and retrieve cached objects under an applications control. As you can see from the listing, the class is a trivial subclass of Hashtable.
This simple cache does not cooperate with the memory manager. All cached objects are kept in memory until the cache is explicitly cleared. Worse yet, the cache grows in size with each new item put in it. There is no size limit. To correct these deficiencies, we introduce WeakReferences into our cache implementation. When an application stores an item in the cache, we won't store it directly. Instead, we put it inside a WeakReference and store that instead. When the garbage collector runs, it is free to collect the cached data if it is not in use.
Listing 2 shows the new implementation. As you can see, introducing References into the design has changed the implementation substantially. The class is no longer a subclass of Hashtable, keeps a private copy of all cached data, and does some Reference manipulation. These changes are necessary because of our requirement that users of the cache be shielded from the use of Reference objects.
To ensure that no Reference objects surface outside the cache, we have to guarantee that all of the access points into and out of the cache are protected. Objects passed into the cache are immediately wrapped by a WeakReference. The WeakReference is stored, but is always stripped off before an object is handed out of the cache. To accomplish this, the new cache implementation makes four core changes:
That's all we need for the basics. The AbstractMap class drives all other operations from the values returned by the entries() method. With that in place, consumers of the cache see a simple put/get interface and can call the other abstract Map methods such as containsKey and containsValue. Additionally, the cache can be enumerated over, compared for equality with other caches, searched, etc. The only operations not supported by the cache are the collection-based deletion operations. For clarity, I've left that code out of these examples, but implementing them is a small extension; we simply change the collection returned from the entries() call to forward its modifications to the private Hashtable of the cache. All in all, we inherit a pretty complete system from the base collection classes.
- The Cache class is changed to inherit from AbstractMap. This class allows us to supply the Map data on demand, stripping off WeakReference wrappers as we go.
- The put() method is adapted to wrap the incoming object with a WeakReference. Since the put() method is supposed to return the previous item that corresponds to the cache key, some clerical work is done to strip off any old WeakReference layers.
- Items put into the cache are stored in a private Hashtable. This Hashtable contains WeakReferences indexed by the cache keys passed to the Cache.put() method.
- The cache implements the AbstractMap.entries() method. This is the method by which the Map data is supplied on demand, and where the cache removes the WeakReference layer. When the layer is removed, an additional check is done to determine whether the WeakReference has been cleared. If it has, the WeakReference is removed from the internal Hashtable and is not passed back as a result of the entries() method.
As you examine the entries() code, you may question whether the cache accounts for potential race conditions with the garbage collector. After all, if the garbage collector is running at the instant the entries() method is trying to determine which items are still available, isn't that an error? The answer is no. To understand why, you should first know that Reference objects are cleared automically. That is, if the Reference.get() method returns a non-null value that the application immediately uses, the garbage collector will not clear out the object. Put another way, the garbage collector performs the test and clear operations on the Reference as an indivisible, uninterruptible operation.
In our cache, once the entries() code has determined that a given Reference contains a non-null referent, a direct Java reference to that interior object is maintained and stored. This prevents the garbage collector from freeing the interior object.
To clarify this just a little, consider the following code:
if ( myReference.get() != null )
myHashtable.put( "some key", myReference.get() );
This example has a bug. The problem is that the garbage collector may run between the first call to myReference.get() and the second. It's possible that the Reference is cleared in the process. The bug won't happen often, but when it does, it will be hard to find. The example can be corrected as follows:
Object o = myReference.get();
if ( o != null )
myHashtable.put( "some key", o );
In this example the object managed by myReference is directly referred to. This prevents any race conditions from freeing the data before it is put into myHashtable.
To Queue or Not to Queue
The attentive reader may wonder why the WeakReferences are explicitly tested for null rather than using a ReferenceQueue for the same purpose. The answer lies in the way Reference objects are queued. The only guarantee provided by the ReferenceQueue for WeakReferences is that each WeakReference instance that has been cleared will eventually be put in the queue. There is no guarantee that the object will be put into the queue at the time it was cleared, or that it will occur in a timely fashion. Since we want to ensure that the user of our cache never sees a cache item suddenly become null, we can't rely on a ReferenceQueue to remove our cache items fast enough.
Testing the Cache
To test the cache, we want to populate the cache with objects, simulate the irregular use of items in the cache, occasionally run the garbage collector and see if the right things remain cached. To achieve this, I wrote the CacheTest program.
This program is implemented by spinning off threads for each of several tasks. A thread that periodically adds items to the cache is started. As each item is added to the cache, another thread keeps a direct reference to the item for a brief time. Another thread periodically runs the garbage collector and prints out whatever is still left in the cache. The implementation of these tasks is found in the class files CacheItemGenerator.java, TimedReference.java and
The user interface for CacheTest is shown in Figure 1. The main screen presents several options. The first pulldown lets you select what type of item should be added to the cache. The second is used to determine how often a new item is added to the cache. The third determines how long a direct reference to the cached item is maintained, and the last pulldown determines how often the garbage collector is run. Try selecting different combinations.
After choosing a set of parameters, press the start button and the machine will be set in motion. After each run of the garbage collector, CacheTest dumps the contents of the cache to standard output. You can identify which items are still in the cache by their names. Cache items are named sequentially, "Key 1, Key 2," etc. Listing 3 shows a sample run where all timeouts are set to one second.
The CacheTest threads can be halted at any time by pressing the stop button. This allows you to reconfigure the cache parameters and run another test.
You should adapt the test program to try caching your own classes. This is easy to do by changing the options in the first pulldown menu. When CacheTest is running, it uses the default class loader to load whatever class is identified in the pulldown. The only requirement is that the class has a default constructor.
Other Kinds of References
The cache presented here could be adapted in several ways. In this implementation I've used WeakReferences for the cache items. This has the advantage of freeing items in the cache when they are not in use and when memory is being reclaimed. For many applications this is desirable. For others, however, items in the cache should be freed only if system memory is running low. In those environments it is inappropriate to free items in the cache merely because they don't happen to be in use.
To address that need, the cache can switch from using WeakReferences to SoftReferences. SoftReferences are cleared by the garbage collector only when their interior objects are not in use and when memory is running low. Additionally, the SoftReferences are subject to a least recently used algorithm. This meets the above objective. With beta 3 of JDK 1.2, however, I've had some variableness and problems with SoftReferences. These could be my own bugs, but because SoftReferences are debuggable only in low-memory situations, I haven't bothered to track down the problem. Caveat emptor.
If you have very particular cache needs, you may also want to investigate using a GuardedReference. A GuardedReference is not cleared by the garbage collector. Instead, the GuardedReferences are put into a ReferenceQueue when the garbage collector sees that the interior object is not in use. It's up to the application developer to pull the objects from the queue and clear them.
Once you take a garbage collection facility as a given in your programming, you can begin to consider new types of memory management that weren't open to you before. The Reference class hierarchy introduced in JDK 1.2 offers the facilities for building these new memory management structures. In particular, by careful crafting of caching algorithms, we can begin to tailor the runtime memory management behaviors in our applications without giving up the type safety and memory protections given to us by Java. With that in hand, we can begin to build better commercial applications than we have ever built before.
About the Author
Lynn Monson is a software Architect at
Novell, Inc. He has some eclectic interests, including distributed collaboration, machine learning and pattern based,object oriented, and Internet architectures. Lynn can be reached at [email protected]
public class SimpleCache extends Hashtable
public class Cache extends AbstractMap
private Map map = new Hashtable();
public synchronized Set entries()
newMap = new Hashtable();
iter = map.entries().iterator();
while( iter.hasNext() )
Map.Entry me = (Map.Entry)iter.next();
Reference ref =
Object o = ref.get();
if ( o==null )
// Delete cleared reference
// Copy out interior object
newMap.put( me.getKey(), o );
// Return set of interior objects
Object put( Object key, Object value )
Reference ref =
new WeakReference( value );
ref = (Reference)map.put( key, ref );
Added class java.lang.String
Cache still holds key 0
Added class java.lang.String
Releasing direct reference
Cache still holds key 1
Cache still holds key 0
Releasing direct reference
Added class java.lang.String
Cache still holds key 2
Releasing direct reference