As the capabilities of our distributed applications
increased, so did our consumption of bandwidth. In 1998, our server
sent objects no larger than 50K to a group of users on a local
network. By 2002, we were passing an average of 500K per object, with
some as large as 1.5MB.
More important, the distribution of our user base grew from
50 to over 1,500, with some users based across the country from the
server. Add in a group of users roaming on their modem connections
and the full scale of our bandwidth issues become clear. We were
presented with a problem faced by many developers of distributed
systems: reduce bandwidth usage and client wait time without
removing any functionality. This article shares our solution to this
problem, providing you with the simple code that helped us eliminate
over 80% of our network traffic.
Evaluating bandwidth is quite simple. The developer has two
options: get more of it or use less of it. Given the magnitude and
expense of expanding bandwidth on a nationally distributed
application, it was clear we had to find ways to reduce the amount of
bandwidth required by our systems. It's important to note the
wording: reduce the bandwidth usage, not the amount of data passed
over the network. To preserve the functionality of the systems, we
needed all the data being passed over the line. In the end, there was
one conclusion: the data needed to be compressed.
As I researched compression in Java, I was looking for a way
to pass in an object and receive a compressed object back. I found
that there are a number of ways to compress sockets or build zip
files on the disk, but not the object-level solution I was seeking.
We needed an API that could be selectively implemented and used for
the largest data objects and most critical applications without
impacting other parts of the system. We also wanted the ability to
compress an object one time, and use that same object for multiple
downloads to client machines, essentially caching a compressed object.
During this research, I found an article on compression on
the Java developer's Web site that laid out all the pieces to our
solution (see Resources section ). Using just a few of the classes in
the java.io and java.util.zip packages, we were able to build an API
to compress any serializable Java object. Being the kind of developer
who prefers simplicity, I was excited at the ease of use and
performance of the underlying Java classes as well as the API we
built. We were able to develop and integrate our solution in just
under two days, resulting in more than an 80% reduction in network
traffic and astounding improvements in client wait times.
A Compression Factory for Serialized Objects
The Java compression functions are located in the
java.util.zip package, where the Deflater class compresses byte
arrays and the Inflater class decompresses byte arrays. As you may
have noted, both of these classes perform compression routines on
byte arrays. Therefore, to compress an object, the first step is to
translate it into a representation of bytes, which begins with the
Serializable interface.
When an object implements the Serializable interface, it can
be represented as a stream of bytes. This byte stream can be written
using the ObjectOutputStream.writeObject() method and reconstituted
using the ObjectInputStream.readObject() method, allowing for a simple translation of a
byte stream to and from an object. This ability to serialize an
object, capturing the resulting byte stream into a byte array,
provides a usable input for the compression methods available in the
java.util.zip classes.
Using this approach, we will accept a serialized object,
write the object into a byte array, and then compress the array. The
array of compressed bytes, along with a few other key variables, will
be stored in a new object, cZipObject, which is shown in its entirety
in Listing 1 . The cZipObject will encapsulate the compressed version
of the input object. The cZipObject can then be serialized to
transfer across the network. On the receiving end, the byte array
will be extracted from the cZipObject, decompressed, input to a byte
stream, and then reconstituted into an object. This process is not
truly compressing the object, but compressing the serialized
representation of the object and its data.
To easily integrate these compression routines on both the
server and client side, we'll create a cZipFactory class that will
contain all the methods for compressing and decompressing objects.
We'll create a number of methods along the way that can be of direct
use, such as a byte compression method. By encapsulating both the
compress and decompress functions into a single class, we can add the
functionality to both the client and server by creating a single
object. This will allow us to compress objects sent from the server
to the client as well as from the client back up to the server.
The first step is to convert the Serializable object into a
byte array. This can be achieved by using the Object-
OutputStream with an underlying ByteArrayOutputStream from the
java.io package. First, we'll create a new ByteOutput
Stream that will capture the byte stream when the object is written.
We'll then create a new ObjectOutputStream, write the serialized
object, and then extract a byte array from the ByteOutputStream.
try {
ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
ObjectOutputStream objOut = new ObjectOutputStream(byteOut);
objOut.writeObject(inObj);
byte[] DataArray = byteOut.toByteArray();
} catch (Exception e) {
System.out.println(e.getMessage());
}
With this code, we now have the ability to translate any
object that implements the Serializable interface into a byte array
capable of compression. The resulting byte array contains the details
of the object as well as the object's data. The array contains the
essential structural and data attributes to replicate the object and
all its content. The next step is to compress the data contained in
the byte array, thereby compressing the serialized representation of
the object.
There are a few simple steps to compressing byte arrays using
the Deflater class from the java.util.zip package. First, we'll
create a new array for the compressed bytes. Without a method to
accurately predict or estimate the size of the byte array resulting
from compression, it's advisable to create an array of equal size to
the noncompressed bytes and then shrink the array once the
compression is complete and the true size can be determined.
The next step is to create a new instance of the Deflater
class, passing in the desired compression level in the constructor.
There are a few options for compression level, each with benefits and
drawbacks. The best compression option provides the greatest
reduction in byte size at the expense of increased processing time.
The best speed option provides a good compression level, usually 80%
or better, in the shortest possible time. I usually opt for best
compression, finding the extra milliseconds in processing time worth
the decreased object size. For more information on the available
compression levels, refer to the JavaDocs for java.util.zip.Deflater.
Once the Deflater object has been created, call the
setInput(byte[]) method providing the byte array we extracted from
the object serialization. Invoke the finish() method to inform the
Deflater class that all inputs have been defined. Next, call the
deflate(byte[]) method, providing the byte array to house the
compressed data. When this method completes its execution, the data
has been compressed and populated in the output byte array. The
getTotalOut() method in the Deflater class will return the total
number of bytes that were written in the output byte array. Using the
new array size, we'll create a byte array to the exact size of the
compressed output. We'll then use the System.arraycopy function to
copy the bytes from the temporary array into the exact size array.
For ease of use, we'll encapsulate these steps into a single
method named CompressBytes in the cZipFactory object (see Listing 2).
Now, when we need to compress a byte array, we can invoke a single
method:
byte[] bytesCompress = ZipFactory.CompressBytes(DataArray);
There are two key pieces of data required to quickly and
accurately decompress the object: the byte array containing the
compressed data and the original size of the serialized byte array.
When the byte array is decompressed, it will be written into another
byte array. Knowing the size of the decompressed array will not only
make the decompression more efficient, it will also ensure accuracy.
To save the byte array and original size easily, we will encapsulate
them in a new instance of the cZipObject class.
cZipObject cZipObj = new cZipObject();
cZipObj.setData(bytesCompress, iOrigSize);
By combining all these steps, we can now create a method that
accepts any Serializable object and returns a cZipObject. This is the
Compress method in the cZipFactory class, shown in its entirety in
Listing 3 . Using the new method in cZipFactory greatly simplifies the
integration of object compression functions. First, we create an
instance of the cZipFactory class, providing the desired compression
level during object creation.
cZipFactory ZipFactory = new
cZipFactory(java.util.zip.Deflater.BEST_COMPRESSION);
Using the new cZipFactory class, we can compress a
serializable object using a single line of code:
cZipObject newZObject = ZipFactory.Compress(inObject);
When the client or receiving machine obtains the cZipObject,
it needs to be decompressed and reconstituted into an object. To
achieve this, we'll create another method in cZipFactory to handle
the Decompress operation. This method will extract the byte array
from the provided cZipObject, decompress the array, and then
translate the bytes into an object. The Decompress method in
cZipFactory will return a Serializable object, which can be cast into
the original type of object.
Using the java.util.zip.Inflater class, we can easily
decompress the byte array in a few lines of code. Given the
compressed byte array and the original size of the byte array, the
Inflater class can be used to decompress the byte array. As this
function could be useful in a variety of situations, we'll create a
method in the cZipFactory class named DecompressBytes. The method
will accept a byte array containing the compressed bytes and a
primitive integer for the size of the decompressed array. At this
point, it's very important that we know the original size of the byte
array (see Listing 4). Without this information, it wouldn't be
possible to accurately predict the total size of the decompressed
bytes without extracting the data in a loop. Knowing the original
size of the byte array makes the decompression code easier and more
efficient.
With the ability to decompress a byte array in place, we then
move to the process of converting the bytes back into a usable object
using an instance of ObjectInputStream. First, we'll create a
ByteArrayInputStream using the decompress byte array. Using the byte
stream, we'll construct a new ObjectInputStream to reconstitute the
object. By invoking the readObject method, the ObjectInputStream will
translate the byte stream into a usable object. To simplify our
coding, we'll place this code in a method named ConvertByteToObject
in the cZipFactory class (see Listing 5).
The final step is to create a Decompress method in the
cZipFactory class that will accept a cZipObject and return a
Serializable object. The completed Decompress method is shown in the
cZipFactory class in Listing 3.
Using the cZipFactory class, we can now decompress a
cZipObject using a single line of code:
Serializable retObject = ZipFactory.Decompress(newZObject);
The Serializable object can then be cast into its original
form or in the same line of code as the call to Decompress:
Vector vClientList = (Vector)ZipFactory.Decompress(newZObject);
In the end, the cZipFactory provides easy-to-use methods that
translate serializable objects to and from compressed representations
of objects. The entire compression API can be quickly implemented in
just a few lines of code. Another important feature is the ability to
use the function selectively rather than a system-wide change, such
as compressing a socket. The resulting cZipObject can be extended or
expanded to meet the requirements of an application or can be treated
like any other Java object. This also allows for the reuse of a
cZipObject, allowing the developer to cache a compressed object,
effectively eliminating the need to redundantly perform compressions.
A Simple Client List Example
Now that we've built the classes to compress Serializable
objects, we'll work through an example using the new objects. To
begin, let's create a vector of client names. For our example, we'll
create a vector with generic content, but you could imagine this list
of clients being derived from a database call, an XML document, or
some other data source.
Vector vClients = new Vector(1000);
for (int i = 0; i < 1000; i++)
vClients.add("Client # " + i);
The resulting vector, vClients, contains 1,000 entries and
when serialized is 14,046 bytes. If the client machine connects using
a 28.8 modem, they will retrieve this vector at approximately 3.33
KBS. At this throughput rate, it'll take the client machine
approximately 4,200 milliseconds to download this list of 1,000
clients. If we wanted to add in compression, we'd add this line of
code on the server:
//Using a pre-existing cZipFactory class instance
cZipObject zoClients = ZipFactory.Compress(vClients);
On the client machine, we add this line of code to decompress
the cZipObject:
//Using a pre-existing cZipFactory class instance
Vector vClients = (Vector)ZipFactory.Decompress(zoClients);
Using this example, the Compress method executes in
approximately 40 milliseconds. We would then transmit the zoClients
object to the client machine, which when serialized is 2,296 bytes.
At 28.8 modem speed, the cZipObject instance is downloaded to the
client in approximately 690 milliseconds. The client then
decompresses the cZipObject, casting the contents into a vector. The
Decompression operation on the client takes an additional 30
milliseconds. The total time using compression was 40 + 690 + 30 =
760 milliseconds. When compared to the original download time of
4,200 milliseconds, the compression technique saved 3,440
milliseconds of client wait time and reduced the total object size by
11,750 bytes, resulting in 83.6% less bandwidth consumption. This is
more than five times faster and is achieved with a few simple lines
of code on the server and client.
Listing 6 provides a simple testing class that was used for
this example and the benchmarks quoted in this article. By using this
simple testing class, you can see that when applied to larger data
structures, the compression functions make a more profound impact on
bandwidth reduction and client wait times.
Expense of Compression
There are two primary expenses to this compression technique:
increased memory usage and CPU cycles. This approach is compressing
the serialized representation of an object, which requires that the
object be serialized into an array that's then compressed and
included in another serializable object. In addition to the increase
in memory usage, there will be an increase in CPU utilization. The
compression routines are comprised of arithmetic operations, which
will result in increased CPU usage during deflation and inflation
processing. For larger installations of these compression routines,
it would be reasonable to expect notable increases in server CPU
usage, which would need to be analyzed in terms of frequency and the
size of the objects being compressed. As a benchmark, in one
installation the server processed approximately 10,000 compressions
an hour on objects ranging from 10K to 350K. The addition of
compression functions resulted in approximately a 3% increase in CPU
usage.
Another important factor to remember is that the client
machines will also have increases in memory usage and CPU utilization
to decompress the objects, or compress objects being sent to the
server. The speed of these decompression routines will depend on the
client machine hardware.
Conclusion
If you are writing distributed Java applications, whether
they're EJB systems or custom RMI solutions, the introduction of
compression routines can provide tremendous improvements to the
response time and bandwidth consumption of your programs. One of the
primary advantages to the approach presented here is its simplicity,
allowing the developer to continually work with objects and avoid the
compression functions. Using the cZipFactory also allows the
developer to avoid socket-level operations or the creation of disk
files, retaining the structure of existing programs and making it
possible to selectively implement the functions. Another benefit of
the cZipFactory is the use of standard Java libraries, making the
compression function available in both J2SE and J2EE applications.
For our applications, the performance of the compression
routines has been excellent, with minimal server impact and network
usage down by 85%. Today, of the approximately 3,000 client machines
using the compression classes, there have been no reports of problems
with CPU utilization or memory usage. Overall, the introduction of
compression was the single largest performance improvement made in
our five year development effort.
Resources
"Compressing and Decompressing Data Using Java APIs":
http://developer.java.sun.com/developer/technicalArticles/Programming/compression/
Object Serialization in Java:
http://java.sun.com/j2se/1.4.2/docs/guide/serialization/
Java Documentation for java.util.zip package:
http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/package-summary.html
Java Documentation for java.io package:
http://java.sun.com/j2se/1.4.2/docs/api/java/io/package-summary.html
SIDEBAR
Calculating the Benefits of Compression
There are a number of benefits to using serialized object
compression, most notably the reduction in the size of the serialized
output. The performance gain is directly related to the average
object size, the bandwidth of the client connections, and the CPU
processing power of the server and client machines. When determining
whether to implement a compression function, these factors should be
projected in order to ensure a positive gain. Consider this simple
equation to determine if compression routines would be beneficial:
[(Object Size bytes) × 8] ÷ [Line Speed kbs] = Avg. Download Time (ms)
[10000 × 8] ÷ 128 = 625 ms
Now, reduce the average object size by 80% and recalculate
the download time; this time add an additional 100 milliseconds for
processing time.
[(Object Size bytes × 8 × 0.2)] ÷ [Line Speed kbs] + 100 = Compress Download
Time (ms)
{[(10000 × 8) × 0.2] ÷ 128 } + 100 = 225 ms
In the chart in Figure 1 we see how the slight increase in
processing time required for compression can create tremendous gains
in download time.
Regardless of the bandwidth from the server to the client,
compression routines will have a definitive impact on network usage
(see Figure 2).
It's important to remember that at some point the law of
diminishing returns becomes prevalent. For example, if the average
size of the object before compression is 5,000 bytes, then
compression could reduce this to as little as 1,000 bytes. The total
expense of this compression would be about 100 milliseconds. If the
client machines were on 28.8 modems, the compression would have a
positive impact, reducing client wait time by about 1,100
milliseconds. However, if the client machines were on 512K
connections, downloading the original 5,000 bytes would only take
about 90 milliseconds. Even though the 1,000 bytes would take 17
milliseconds, we have now added additional processing time for the
compress and decompress operations, potentially creating a negative
return, and not significantly impacting download time.
The chart in Figure 3 helps to illustrate how the benefits of
compression on client wait time can be quickly reduced in higher
bandwidth environments. It's important to note that while client wait
time may not be significantly reduced by compression, network traffic
will always be reduced. Even though the end user may not notice
improvements, the network will always benefit from the reduction in
throughput.
About The Author
Robert Beckett is the chief architect for The Software Development
Cooperative (www.thesdc.com), where he leads their efforts in
building high-performance, scalable Java tools and solutions.
rbeckett@thesdc.com