HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML

There are many articles about basic performance tuning a Java application. They all discuss simple techniques such as using a StringBuffer versus using a String, and the overhead of using the synchronized keyword.

This article doesn't cover any of this. Instead we focus on tips that can help make your Web-based application faster and highly scalable. Some tips are detailed, others brief, but all should be useful. I end with some recommendations that you can present to your manager.

I was inspired to write this article when a co-worker and I were reminiscing about our dot-com days - how we designed for systems that could support thousands of users and had tight code, and how we hit aggressive deadlines. Sometimes there's a trade-off between designing for reuse and designing for performance. Based on my background, performance wins every time. Your business customers understand fast-performing systems even if they don't necessarily understand code reuse. Let's get started on our tips.

How to Use Exceptions
Exceptions degrade performance. A thrown exception first requires the creation of a new object. The constructor in the throwable interface calls a native method named fillInStackTrace(). This method is responsible for walking the stack frame to collect trace information. Then whenever an exception is thrown, it requires the VM to fix the call stack since a new object was created in the middle.

Exceptions should be used for error conditions only, not control flow. I had the opportunity to see code in a site that specializes in marketplaces for wireless content (name intentionally withheld) where the developer could have used a simple comparison to see if an object was null. Instead he or she skipped this check and actually threw Null- PointerException.

Don't Initialize Variables Twice
Java by default initializes variables to a known value upon calling the particular class's constructor. All objects are set to null, integers (byte, short, int, long) are set to 0, float and double are set to 0.0, and Booleans are set to false. This is especially important if the class has been extended from another class, as all chain constructors are automatically called when creating an object with the new keyword.

Use Alternatives to the New Keyword
As previously mentioned, by creating an instance of a class using the new keyword, all constructors in the chain are called. If you need to create a new instance of a class, you can use the clone() method of an object that implements the cloneable interface. The clone method doesn't invoke any class constructors.

If you've used design patterns as part of your architecture and use the factory pattern to create objects, the change will be simple. Listed below is the typical implementation of the factory pattern.

public static Account getNewAccount() {
return new Account();
}
The refactored code using the clone method may look something like this:
private static Account BaseAccount = new Account();
public static Account getNewAccount() {
return (Account) BaseAccount.clone();
}
The above thought process is also useful for the implementation of arrays. If you're not using design patterns within your application, I recommend that you stop reading this article and run (don't walk) to the bookstore and pick up a copy of Design Patterns by the Gang of Four.

Make Classes Final Whenever Possible
Classes that are tagged as final can't be extended. There are many examples of this technique in the core Java APIs, such as java.lang.String. Tagging the String class as final prevents developers from creating their own implementation of the length method.

Furthermore, if a class is final, all the methods of the class are also final. The Java compiler may take the opportunity to inline all final methods (this depends upon the compilers implementation). In my testing I've seen performance increase by an average of 50%.

Use Local Variables Whenever Possible
Arguments that are part of the method call and temporary variables that are declared a part of this call are stored on the stack, which is fast. Variables such as static, instance, and new objects are created on the heap, which is slower. Local variables are further optimized depending upon which compiler/VM you're using.

Use Nonblocking I/O
Current versions of the JDK don't provide nonblocking I/O APIs. Many applications attempt to avoid blocking by creating a large number of threads (hopefully used in a pool). As mentioned previously, there's significant overhead in the creation of threads within Java. Typically you may see the thread implementation in applications that need to support concurrent I/O streams such as Web servers, and quote and auction components.

JDK 1.4 introduces a nonblocking I/O library (java.nio). If you must remain on an earlier version of the JDK, there are third-party packages that have added support for nonblocking I/O: www.cs.berkeley.edu/~mdw/proj/java-nbio/download.html.

Stop Being Clever
Many developers code with reuse and flexibility in mind and sometimes introduce additional overhead into their programs. At one time or another they've written code similar to:

public void doSomething(File file) {
FileInputStream fileIn = new FileInputStream(file);
// do something
It's good to be flexible, but in this scenario they've created more overhead. The idea behind doSomething is to manipulate an InputStream, not a file, so it should be refactored as follows:
public void doSomething(InputStream inputStream){
// do something
Multiplication and Division
Too many of my peers count on Moore's Law, which states that CPU power will double every year. The "McGovern Law" states that the amount of bad code being written by developers triples every year, ruling out any benefit to Moore's Law. Consider the following code:
for (val = 0; val < 100000; val +=5) { shiftX = val * 8; myRaise = val * 2; }
If we were to utilize bit shifting, performance would increase up to six times. Here's the refactored code:
for (val = 0; val < 100000; val += 5) { shiftX = val << 3; myRaise = val << 1; }
Instead of multiplying by 8, we used the equivalent to shift to the left (<<) by 3. Each shift causes a multiplication by factors of 2. The variable myRaise demonstrates this capability. Shifting bits to the right (>>) is the same as dividing by factors of 2. Of course this makes execution speed faster, but may make it difficult for your peers to understand at a later date; therefore it should be commented.

Choosing a VM Based on Its Garbage Collection Implementation
Many people would be surprised that the Java specification doesn't require the implementation of a garbage collector. Imagine the days when we all have infinite memory computers. Anyway, the garbage collector routines are responsible for finding and throwing away (hence garbage) objects that are no longer needed. The garbage collector must determine what objects are no longer referenced by the program and make the heap memory that's consumed by the object free. It's also responsible for running any finalizers on objects being freed.

While garbage collection helps ensure program integrity by intentionally not allowing you to free memory you didn't allocate, this process also incurs overhead as the JVM determines the scheduling of CPU time and when the garbage collector runs. Garbage collectors have two different approaches to performing their job.

Garbage collectors that implement reference counting keep a count for each object on the heap. When an object is created and a reference to it is assigned to a variable, the count is incremented. When the object goes out of scope the reference count is set to zero and the object can be garbage collected. This approach allows for the reference counter to run in small time increments that are relative to the execution of the program. Reference counting doesn't work well in applications in which the parent and child hold references to each other. There's also the overhead of incrementing and decrementing the reference count every time an object gets referenced.

Garbage collectors that implement tracing trace out a list of references starting with the root nodes. Objects found while tracing are marked. After this process is complete, any unmarked objects known to be unreachable can be garbage collected. This may be implemented as a bitmap or by setting flags in the object. This technique is referred to as "Mark and Sweep."

Recommendations for Your Manager
Other approaches can be used to make your Web-based application faster and more scalable. The easiest technology to implement is usually a strategy that supports clustering. With a cluster, a group of servers can work together to transparently provide services. Most application servers allow you to gain clustering support without having to change your application - a big win. Of course you may need to consider additional licensing charges from your application server vendor before taking this approach.

When looking at clustering strategies there will be many additional things to consider. One flaw that's frequently made in architecture is having stateful sessions. If a server/process in the cluster crashes, the cluster will usually fail over the application. For this functionality to happen, the cluster has to constantly replicate the state of the session bean to all members in the cluster. Make sure you also limit the size and amount of objects that are stored in the session, as these will need to be replicated.

Clusters also allow you to scale portions of your Web site in increments. If you need to scale static portions, you can add Web servers. If you need to scale dynamically generated parts, you can add application servers.

After you've put your system in a cluster, the next recommended approach to making your application run faster is choosing a better VM. Look at the Hotspot VM or other VMs that perform optimization on the fly. Along with the VM, it's a good idea to look at a better compiler.

If you've employed several industry techniques plus the ones mentioned here and still can't gain the scalability and high availability you seek, then I recommend a solid tuning strategy. The first step in this strategy is to examine the overall architecture for potential bottlenecks. Usually this is easily recognized in your UML diagrams as single-threaded components or components with many connecting lines attached.

The final step is to conduct a detailed performance assessment of all code. Make sure your management has set aside at least 20% of the total project time for this undertaking; otherwise insufficient time may not only compromise your overall success, but cause you to introduce new defects into the system.

Many organizations are also guilty of not having the proper test beds in place due to cost considerations. Make sure your QA environment mirrors your production environment, and your QA tests take into account testing the application at different loads, including a low load and a fully scaled load based on maximum anticipated concurrent users. Performing tests, sometimes to gauge stability of a system, may require running different scenarios over the course of days, even weeks.

Under no circumstances should you undertake tuning an application without a profiler. We use Optimize it, but Sitraka's JProbe and Numega's profiler are also good. These tools will show you bottlenecks in your code, such as threads that are blocked by other threads, unused objects that survive garbage collection, and excessive object creation. Once you've captured the output of these tools, make simple changes and limit the scope of those changes to things that will make your code faster. Don't worry about reuse, style issues, or anything other than performance. Usually the easily identifiable bottlenecks will be contained within loops and algorithms.

Author Bio
James McGovern is an enterprise architect with Hartford Technology Services Company L.L.C., an information technology consulting and services firm dedicated to helping businesses gain competitive advantage through the use of technology. His focus is on the architecture of single sign-on solutions. [email protected]

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.