Every chance I get, I lobby for performance tools for software developers, because performance tuning is hard. This is especially true in modern object-oriented languages like Java, as opposed to older languages like C where the programming model was much closer to the hardware model. Furthermore, performance can affect the user's perception of your software. Hence, for any serious software-development project it's critical to have good performance tools available to assist the tuning efforts. OptimizeIt 2.0 Professional from Intuitive Systems is one such tool.
OptimizeIt claims to work with most Java 1.1 VM's including the following:
- Sun Microsystems JDK 1.1.1 through 1.1.6
- Borland JBuilder 1.0
- Symantec Café 2.5
I tested OptimizeIt on a Windows NT 4.0 system with a 200 MHz Pentium, 32 MB of memory (the minimum supported configuration) and the standard Sun JDK 1.1.6 installed. I ran OptimizeIt on several Java programs including one 200,000-line application.
Installation and Documentation
The installation was extremely smooth and uncomplicated. It required a mere 8 MB of disk space for the install. No hard-copy documentation was provided; however, the online documentation included a user manual (36 pages when printed) and a short tutorial. Both are in HTML format and viewed through your Web browser.
Two Tools in One
OptimizeIt actually contains two tools, a CPU time profiler and a memory profiler integrated into one GUI. Often, time and memory profilers are offered separately, but either one by itself gives you only part of your program's performance. By including both in a single program, OptimizeIt allows you to see both sides of the coin.
Using the Product
After specifying the program to run, such as CLASSPATH, source path(s) and then arguments, press the Start button in OptimizeIt to run the target program. It can be paused at any time to study the performance data up to that point and then resumed later. By default the memory profiler is always enabled and collects memory data continuously as the program runs, although it can be explicitly disabled if you wish. While the memory profiler is collecting data, it can be viewed in real time on the memory profile screen.
For the CPU profiler you explicitly start and stop recording performance data. Generally, you would run the program up to the point where you want to begin measuring and press the Start CPU Profiler button. Then you wait until the program has finished the operation you want to measure and press the Stop CPU Profiler button. An option is provided to pause automatically before the start of the program in case you want to profile it from the very beginning. Unlike memory data, CPU data can't be viewed in real time as it's being collected. The CPU data becomes viewable when you stop recording.
OptimizeIt is an interactive, live analysis tool. As long as the program and the JVM are live, you can get performance information. Once the program exits, however, the performance data is no longer available. Hence, an option is provided to disable exits so that even if the program calls System.exit(), the JVM is forced to remain alive so you can obtain the data.
CPU Profiling Features
The CPU profiler uses statistical sampling to determine how much time is spent in different parts of your program. The sampling interval is 25 milliseconds by default, but can be set as low as 5 milliseconds. Timing results are presented in a sorted hierarchical tree format that shows the top-level methods (e.g., the main() method in an application), their cumulative time (time spent in those methods plus all of their descendants) and percentage of total time (see Figure 1). You can then click on a node to cause it to expand and show all of the methods it calls directly, as well as their associated times. In this way it's easy to "drill down" the dynamic call graph of the program until you get to a leaf method. There is an option to invert the tree so that the bottom-most leaf methods are shown first and you can "drill up" to the top-level methods.
This tree display is intuitive and easy to use. Some things, however, are difficult to see in this display. For example, it's not possible to see all the callers and callees simultaneously for a given method. In order to do this, you must switch back and forth between normal and inverted modes.
Also shown on the CPU profile screen is a "hotspot" display, which lists methods sorted by the total time spent in each one, regardless of who called it. This is the display that clearly tells you which methods to focus your optimization efforts on.
The really great feature about the CPU profiler is that data can be viewed for a particular thread or thread group; there's a hierarchical list in the program, including system threads (such as the AWT event thread) and a time line for each thread that shows when it was in a running or suspended state. This alone is a useful display, especially if your program has several threads. You can then select a thread or thread group from the list, and the tree and hotspot displays will show only data for that particular selection.
By default, the display shows real time (CPU plus wait time); however, you can opt to see CPU time only. This is useful for tuning algorithms as it relates directly to the number of instructions executed.
Memory Profiling Features
The memory profiler shows object allocations aggregated by type. As previously mentioned, this display is continuously updated as the program runs, showing the current instance counts (and, optionally, sizes) for every object type. It would have been nice if it computed totals for you as well. There's also a checkpoint feature that allows you to set a baseline for the instance counts. Subsequent measurements will then show the counts relative to the baseline. They'll go up as new objects are allocated, and go down again when the garbage collector frees them. There's an option to disable the garbage collector, which is useful if you wish to see the total allocations over the life of the program.
Unfortunately, this display doesn't include arrays. Since arrays can account for a majority of heap space in many programs, I found this to be a significant omission, although Intuitive promises to include this feature in the next version. Another feature I'd love to see is the ability to show memory profiles by thread or thread group, just as the CPU profiler currently allows.
If you select an object type, you can then go to a second screen that shows, in a hierarchical tree format, the stack backtraces of all the places in the program where that type was allocated. You can go to a third screen that shows all instances of that type along with its values (the value shown is the result of calling the toString() method on the object). The instance screen also includes a unique and powerful heap analysis feature that shows you all incoming or outgoing references (but not both at the same time) to a given object instance. This is valuable because, although Java has an automatic garbage collector, it won't collect an object if you unknowingly keep a reference even though it's no longer needed. A heap analysis tool such as this is the only practical way to find such memory leaks. Apart from performance reasons, it can even be used as a debugging aid to inspect the connections between your data structures and find possible errors.
- A source viewer is provided to display source code whenever you click on a method name, provided you compiled the code with the -g flag.
- Printing is not supported directly, but you can export the data into HTML format and print from your browser. You can also export into ASCII or "importable" ASCII formats.
- Advanced users will appreciate that a set of API methods are provided to enable and disable the profilers at the program level so as to gain even finer control on the parts of the program to be measured.
OptimizeIt generally performed well throughout my tests - a fact made more impressive considering it's written almost entirely in Java. Clearly the OptimizeIt engineers must have used the tool on itself! I did find switching between screens a bit sluggish on my small memory (32 MB) machine, but presumably this would be less of a problem on a larger memory system. I ran a benchmark program to measure the amount of overhead profiling. For the CPU profiler by itself, a 1.2x slowdown; for the memory profiler, there was a 2.2x slow down. This is quite reasonable, and I think most users are willing to tolerate it in return for quality performance data. With both profilers running simultaneously, I measured a slowdown of 2.4x. While OptimizeIt allows it, I don't recommend running both at the same time because the overhead of the memory profiler will affect the CPU profiler and can skew the results.
OptimizeIt provides a CPU profiler and a memory profiler, both of which are needed as part of a full-performance tuning effort. It's easy to install and operate. The outstanding feature of the CPU profiler is its ability to partition the timing data by thread or thread group; for the memory profiler it's a heap-analysis capability that allows you to track all references to and from a given object instance. OptimizeIt is a welcome addition to any Java programmer's arsenal of development tools.
About the Author
Achut Reddy is a staff engineer at Sun Microsystems in the authoring and development tools group, which is currently working on Java performance issues. He can be reached via e-mail at [email protected]