The so-called duopoly of Intel and Microsoft brought one great advantage to personal computing - a uniform runtime platform for application software. Application software developers need only code the application using the Win32 API and compile for Pentium to be assured that their programs will run on a majority of desktops.
In contrast, Internet appliance devices are nonhomogeneous, using many different types and variants of microprocessors and many different operating systems. What this means is that software developed for one device will most likely not run on another device unchanged because of OS and/or binary incompatibilities. (Note: For the purposes of this article, a platform means the combination of a microprocessor and an operating system that includes event management and an API that supports a consistent GUI.)
You should also consider the ubiquitous expansion of the Internet beyond the realm of PCs. No longer are PCs (including Apple and Unix systems) the sole conduit for Internet access. Portable devices that include Web tablets, PDAs, and wireless handsets are increasingly enabled with Internet- access capability. Following the trend established with PCs, content delivered via the Internet will be a combination of static information, dynamic personalized information, and interactive content that includes applications.
Consider the case of Short Messaging Services (SMS) on wireless handsets. Each wireless services operator has a different protocol and capability for piggybacking SMS over its existing digital network. This requires a carefully developed SMS client integrated into any wireless handset that matches the protocol and capabilities of the carrier. Imagine the value to a wireless services operator if they had to develop only one SMS client program that could be downloaded into any wireless handset. Further, imagine that the SMS client program could be upgraded as the capabilities of its network expands, enabling subscribers to readily take advantage of the expanded services.
As interactivity and application software become an ever more important aspect of Internet connectivity and services, the necessity for a universal runtime platform becomes quite apparent. However, unless the runtime platform is based on exact standard components, as in the case of "Wintel" PCs, a resource, cost, and performance impact accompanies the inclusion of a universal intermediate runtime platform. Therefore, an ideal universal runtime environment must possess the following properties:
1. The platform must offer binary compatibility. Executable software is delivered as "binary object code" or "binaries." For example, software developed for a Macintosh computer is compiled to PowerPC binary object code, whereas software developed for an IBM-compatible PC is compiled to Pentium binary object code. As the PowerPC and Pentium microprocessors have different architectures and instruction sets, binaries for one are not interchangeable with the other. Hence the creation of an abstract runtime environment is needed to provide a bridge between an underlying microprocessor and application software binaries.
2. The platform must include a microprocessor, or an emulation of one, that includes a universal machine code that's fully separated from the underlying microprocessor's machine code.
3. The platform must include an operating system with an API and corresponding capabilities that support a consistent GUI. An API is a set of software functions that perform typical operations such as opening files, reading/writing data, allocating and managing memory, handling events, and displaying text and graphics. For application software to be truly portable, a device must possess a common set of capabilities as well as extend those capabilities to software developers via a consistent API.
4. The platform must not impose heavy resource requirements on a system. Portable devices are price-sensitive and must meet specific price points. Therefore it's necessary that the runtime environment doesn't cause a significant increase in the bill of materials and hence the overall cost of the device. Also, the leaner the requirements necessary to support a runtime environment, the broader the range of devices that can be supported.
5. The platform must be energy efficient, which is an absolute necessity for battery-powered portable devices.
6. The platform must be secure, while being Internet-aware.
Java with its "write once, run anywhere" promise possesses such properties and was designed by Sun Microsystems to address the related requirements. However, for embedded applications such as portable devices, Java has on the whole been slow to catch on until now, primarily due to performance issues and the costly system resources required for integrating Java technology.
Wireless telephone handsets are now incorporating Java technology to accept Java applications through a wireless network and run them on the mobile telephone. PDA manufacturers are beginning to utilize the Java platform as their standard and sole operating environment, allowing the manufacturer to deliver a highly differentiated and extensible product to market. Television set-top unit manufacturers have also embraced Java technology as a means to enable the delivery of interactive services (such as electronic program guides) and application software, as well as to add new functionality to their devices over time.
The value of a universal runtime platform, such as Java's, is clear, and the trend to incorporate such platforms into portable Internet-enabled devices is also clear. However, two primary issues remain before the floodgates open: the performance with which the system executes Java software and the cost to include the platform.
The Java Runtime Environment
The Java Runtime Environment (JRE) has been designed to provide runtime environment capabilities similar to, for example, a PC, an Apple Mac, a PocketPC device, or a PalmPilot. The Java 2 Micro Edition (J2ME) is the specific JRE for portable devices, Internet appliances, and other miscellaneous embedded applications. Figure 1 shows the functional blocks of a JRE that's made up of two primary components - a Java Virtual Machine (JVM) and a specific set of configuration and profile class libraries
The primary components of the JVM that impact performance are:
1. Java bytecode interpreter: Interprets Java bytecode instructions into native microprocessor instructions and is the heart of the JVM, and consequently the main performance bottleneck. The execution of bytecodes is where the biggest performance bottleneck exists, because each bytecode must be interpreted one-by-one into the native instructions of the native microprocessor.
Each Java bytecode instruction typically interprets to several native microprocessor instructions. It should be noted that the JVM and its corresponding instruction set are based on a stack architecture. The implications of a stack-based architecture will be discussed later.
2. Class manager: Determines which class files to load and when. The class loader also scans class files and individual bytecodes for malicious code (such as viruses) as part of the intrinsic security of a JVM. Once a class file has been loaded and verified, its bytecodes are passed on to the interpreter for execution.
3. Garbage collector: Reclaims memory that's no longer used. Because the Java programming language is object-oriented with an integrated garbage collection mechanism, programmers are freed from directly managing memory resources. Instead, objects are created and use memory on the fly as they're instantiated, and the memory that they use is released when the garbage collection process determines that the object is no longer needed. The garbage collection process runs concurrently with any Java software that's running, and therefore directly impacts runtime performance.
Java Software Performance Issues
Programs developed in the high-level Java programming language are compiled to the universal binary intermediate language referred to as Java bytecode instructions. When Java software is executed, the Java bytecode instructions are interpreted to the native instruction set of the microprocessor on which the JRE has been ported and runs. This makes the bytecodes an intermediate language and the JRE an intermediate runtime platform.
An intermediate runtime platform such as the J2ME typically suffers from a performance disadvantage (as compared to natively compiled software), because the underlying system consisting of the microprocessor plus operating system must simultaneously run the intermediate platform as well as the application software. Put another way, Java software execution is handicapped because a system must execute two programs simultaneously to run a Java program - the JVM and the Java program itself. In addition to more heavily utilizing the CPU to execute Java software, the Java platform requires additional memory for its footprint and runtime needs. This gives rise to the clear goals of improving Java software execution performance and minimizing memory requirements.
Before talking about increasing Java software performance, it's helpful to clearly understand the pertinent performance issues:
1. As previously mentioned, Java software binaries are executed by a JVM that interprets each Java bytecode instruction into native microprocessor instructions - a system must run a JVM program that in turn runs Java software. In other words, to run a Java program the system concurrently executes two programs and two different instruction streams.
2. Typical commercial microprocessors, especially those used in portable devices, have register-based architectures; however, the JVM is stack-based. This is important because executing Java bytecode instructions differs significantly from the way commercial microprocessors operate. In particular, temporary data, values, and method arguments are passed through variables and a common stack.
The Java stack resides in the system's memory, which contributes to performance challenges because each stack interaction requires a memory transaction. Significant performance gains can be achieved by the careful localization of variables and stack entries within the CPU, which in effect helps to bridge the gap between the stack-based JVM and register-based microprocessors.
Boosting the Performance of Java Software
Portable devices are at a disadvantage when compared to desktop and server computers because they can't be made to take advantage of higher processor speeds, larger memories, and persistent storage available with hard disk drives. So the techniques to speed-up Java software execution on desktop and server computers are wholly inappropriate for portable, cost-sensitive consumer devices.
Performance-boosting techniques for portable devices fall into four categories:
1. Improve the performance of the underlying system. The typical option is to increase the speed of the microprocessor. This approach does increase performance but only moderately. In most systems the microprocessor runs 2-10x faster than the system's memory, so increasing the clock frequency only causes additional waiting for memory accesses such as interactions with the Java stack. Thus there is not a linear relationship between Java software performance and the clock frequency of the system's microprocessor. An additional detriment is that it raises the system cost and consumes more power. Because this is such a poor option, particularly on cost- and energy-conscious devices, this technique won't be discussed further.
2. While not specifically a compilation technique, another common method to boost performance is to carefully optimize and tune the JVM software to take full advantage of the target hardware. This often involves coding the bytecode interpretation loop in the assembler and utilizing CPU registers for common data. Other common optimizations include streamlining method invocations and optimizing the garbage collection process. Only one drawback exists to this approach: the resultant JRE becomes quite dependent on and tied to a specific hardware configuration.
3. Compilation - the ultimate goal is for Java software performance to be equivalent to software compiled directly to the microprocessor's native machine code. Several techniques, described later, that compile Java software to native machine code are used to boost performance. Compilation techniques do result in performance gains, and often shine in static benchmark tests. However, compilation techniques are contrary to some aspects of the Java-platform paradigm, can't address energy conservation issues, increase memory requirements, and are limited in their capabilities.
4. As is often the case, hardware-based acceleration delivers greater performance, energy efficiency, and cost-effectiveness than can be achieved with software.
To Compile or Not to Compile
That is indeed the question. The usual way to enable a device with Java technology is to acquire the J2ME development kit from Sun Microsystems and port the JVM, configuration, and appropriate profile(s) to a specific target system. Up until now, the most common way to boost performance was to use a JVM augmented with compilation technology (discussed in detail later). Using a compilation-augmented JVM is also a relatively fast and easy way to introduce performance-boosting capabilities into a Java-enabled device. While it's tempting to think it's also the most cost-effective method, be aware of the memory requirements.
Regardless of the compilation method used, a common set of drawbacks accompanies compilation, such as greatly expanded memory usage and inconsistent execution performance. Compilation can be done several ways.
Ahead-of-Time (AOT) Compilation
This is where the Java program is compiled in totality prior to being executed. Two methods are used for AOT compilation. "Way-ahead-of-time compilation" is when the program is compiled, saved, and distributed as a native executable, much in the same way that a C or C++ program is compiled, saved, and distributed. This option converts the Java program to a system-specific program and hence it's no longer portable.
The other method is when the program is distributed as a standard Java class file and compiled when it's loaded but before being executed. This latter option causes a (sometimes lengthy) delay before the program can begin running, which is typically unacceptable for transient software (such as applets and MIDlets).
This is the most common technique used in portable consumer devices where Java applications are likely to be transient. This is where a compiler is integrated into the JVM to compile portions of a Java program concurrently while the JVM is running a Java program. Two primary JIT compilation methods are commonly used. JIT compilation is where individual methods and/or classes that make up the program are compiled when loaded but before being executed. The other method is when an executing program is analyzed to determine frequently used classes, methods, and/or code fragments. These code sections are thereafter compiled and executed as native code. This latter case is referred to as adaptive compilation, statistical JIT compilation, or incremental JIT compilation.
Both options expand memory use and result in inconsistent performance because of the delays that occur during the compilation process. Furthermore, the dynamic compilation option typically uses only a relatively small amount of memory to store compiled code segments. As these segments change, the performance resulting from dynamic compilation is the most inconsistent because at different times a section of code may execute either slowly or quickly.
Table 1 lists the pros and cons of the different acceleration solutions.
Hardware Is Happening
Hardware-based acceleration of typical software functions (traditional 3D rendering now being performed by 3D graphics hardware is a good example of this) typically delivers the highest possible performance due to the speed and efficiency at which dedicated hardware can execute. Hardware-based acceleration of Java bytecode execution is no exception to this rule. You should be able to intuitively appreciate that dedicated hardware can offer greater performance than software that must share CPU resources with a running JVM. Hardware Java acceleration solution vendors claim average application performance improvements from 5-15 times over that of a standard JVM, as well as varying energy efficiencies (versus a Sun Microsystems reference implementation as the baseline). Of course, performance is also impacted by other components within a system, such as the speed of the memory and the capabilities of the host application microprocessor (if applicable).
Another noteworthy item is that some hardware solutions are transparent to the operation of a system, with the clear exception of the JVM. Such solutions tend to be design-friendly because they can be readily integrated into a system without changing or affecting the bios, operating system, or other legacy software, while also leveraging existing hardware designs, development tools, and in-house expertise.
It should be pointed out that a Java accelerator complements a JVM but neither replaces nor substitutes it. A Java Virtual Machine has several components, only one of which - the Java bytecode instruction interpreter loop - is enhanced or replaced by most accelerator solutions. The JVM still takes care of loading and verifying Java classes, managing memory, scheduling tasks, and performing other housekeeping functions. The more efficiently these other functions are implemented, the greater the improvement a Java bytecode execution accelerator can contribute to overall system performance.
For embedded device designs where cost, system resources, and power consumption are critical, hardware-based acceleration is rapidly becoming the preferred choice. Consider that in today's systems the difference between the speed at which microprocessors and memory are clocked can be as much as 10 to 1. This too will be the case as microprocessors are run faster in portable devices and Internet appliances.
Software-based acceleration using compilation is then handicapped by the speed of system-memory devices, whereas hardware-based acceleration can include technology to compensate for slow memory. A recent study from Penn State University, published by USENIX, confirms that even though the accelerator device draws power, hardware-assisted Java software execution is the most energy-efficient choice, something that's especially important in battery-powered devices.
Further validating the value proposition of hardware-based, Java performance-boosting solutions are leading market analysts, such as Cahners In-Stat, that predict that within the next few years the majority of JVMs integrated into portable devices, Internet appliances, and other miscellaneous embedded applications will utilize hardware-assisted acceleration.
Hardware solutions can be segmented into three categories:
1. Microprocessor extensions include two implementation methods - instruction path interpreters and instruction set extensions. Both methods enable a microprocessor to interpret Java bytecode instructions as part of their normal processing capabilities. Such Java interpretation extensions are analogous to the way the MMX extensions in a Pentium microprocessor enable accelerated processing of multimedia in addition to standard x86 instructions.
2. Java accelerators are standalone chips or components within a System-on-Chip (SoC) that directly execute Java bytecode instructions. One specific type of technology or chip in this category is a hardware JIT compiler that, of course, shares the same drawbacks as software JIT compilers. Excluding hardware JIT compilers, Java acceleration chips are analogous to the way graphics accelerators complement a microprocessor by performing many complex pixel manipulations and rendering functions directly with the video memory.
With regard to energy efficiency, a dedicated Java bytecode execution engine is most likely a less complex chip and so has fewer transistors than a larger general-purpose microprocessor, drawing less power and executing bytecodes faster than a microprocessor can interpret. Thus such a chip can contribute energy efficiency to a system's attributes. Such energy efficiency doesn't apply to either software or hardware JIT compilers. In other words, dedicated hardware is typically more efficient at performing specific tasks than a general-purpose microprocessor.
3. Native Java microprocessors (NJMs) are unique microprocessors that use Java bytecode instructions as the native instruction set. Some NJMs may have additional instructions that are outside the scope of executing Java software to support OS and driver development.
It's tempting the think of an NJM as the most natural choice for accelerating Java bytecode instruction execution, hence boosting the performance of Java software. However, such devices are contrary to typical device manufacturers' desires to leverage in-house expertise and assets, as well as use cost-effective standard components and technologies. The device manufacturer is prevented from acquiring and using readily available microprocessors, I/O devices and corresponding drivers, operating system software, and development tools. Rather the device designer and manufacturer are faced with the unpalatable requirement to acquire new and specific items, if available; if not available, those items must be developed from scratch. For these reasons NJMs have consistently failed to achieve broad acceptance, even NJMs from Sun Microsystems, the pioneer of the Java platform.
Table 2 lists the pros and cons of the different hardware acceleration solutions.
While many options exist for boosting the performance of Java software executing on Internet appliances, portable devices, and the like, the appropriate selection is neither simple nor obvious. The choice of a particular technology, option, or product should be driven by a careful assessment of market and device requirements. It should also be clear that among the solutions presented, hardware solutions offer the best balance between cost, complexity, design-friendliness, performance, and energy efficiency. It shouldn't be a surprise that hardware solutions, particularly microprocessor extensions and dedicated accelerator chips, are becoming standard components. Furthermore, system transparency and design-friendliness are critical attributes that are often overlooked in hardware solutions.
No discussion about software performance is complete without discussing benchmarks. In this regard, it should be noted that scores resulting from real-time compilation are typically not representative of how actual Java application software will perform. In fact, compiled code that exhibits benchmark scores 10-20 times faster than interpreted code may result in execution that is only one to two times faster for real-world Java application software. This is particularly true of dynamic or statistical JIT compilers, because most benchmark tests are designed as loops. JIT compilers compile and execute the loops very fast, skewing the scores (for example, a loop that runs for 1,000 iterations may run interpreted and slowly for five iterations, then compiled and fast for the remaining 995 iterations).
Hardware-based acceleration, unlike software-based acceleration, generally delivers a very consistent benchmark and real-world application performance. In other words, if a particular hardware device or technology delivers a tenfold increase in benchmark scores, it's reasonable to assume that real-world Java applications will execute 10 times faster. Be aware that Java benchmarks are heavily impacted by the system on which the JRE is running, so normalized comparisons are sometimes irrelevant if not meaningless. Other failings of benchmarks are that they don't characterize system resource utilization (such as memory) or energy efficiency.
1. Microprocessor extensions come about by integrating Java acceleration technology with a microprocessor's core. As the supply of Java-enhanced microprocessors proliferates into the market, the time-to-market will be favorably impacted. Absent the broad availability of Java-enhanced microprocessors, this technology is mostly limited to vertically integrated companies and partnerships.
2. Existing or new devices can readily be enhanced or designed, respectively, to accommodate a Java accelerator chip.
3. An entire system design that includes all system software must be developed to accommodate an NJM.
Ron Stein is a senior marketing manager at
Nazomi Communications, Inc. He has more than
20 years' experience with Java and embedded
software. Prior to Nazomi, he managed product
marketing for Insignia Solutions. Ron holds an
MBA from Santa Clara University and a BSEE
from the University of Pittsburgh.