While representing my company at JavaOne this year, it was apparent that many Java engineers are becoming more interested in the issues surrounding JVM selection and integration. Many questions were asked concerning the trade-offs involved in the different ways of implementing the JVM. This article is aimed at helping device manufacturers, OEMs, and J2ME application engineers understand the issues - and at helping to initiate further questions when talking to JVM vendors.
What Are the 'Risks' Associated with Java?
In my previous article "Java Thick Clients with J2ME" (JDJ, Vol. 6, issue 6), I outlined a number of risks commonly associated with Java. These include:
While discussing various Java risks and virtual machine-implementation approaches, I'll refer to the following terms: code bloat, compile stalls, and profiling stalls.
- Memory requirements
- Power consumption
- Licensing and silicon cost
This is a real problem when an application's Java byte code is compiled down to the native processor instructions, and is especially the case when using RISC processors. Java byte code is abstract, and does a lot more work in less space than long-winded native instructions. Whenever a Java program is compiled on or off the device, the native code needs to reside in memory so it can be run. This expansion in memory is referred to as code bloat.
Compile stalls occur when, for example, a JIT compiler compiles the application's Java byte code into native machine code on the device as the application is running. Obviously, this requires processor power and can result in a noticeable stall as the compiler runs. Depending on the device, processor, clock speed, VM, and the application, this stall may often go unnoticed by the end user if it's "hidden" well enough.
In order to tackle the problems associated with code bloat and compile stalls, some VM vendors decide to compile only the parts of the application that take the longest to execute. However, no matter which technique is used, this profiling will require some processing power, and again may not be noticeable depending on the device, processor and VM, and the application.
It's worth remembering that any of this extra processing can be easily hidden from the user. Simply increasing the clock speed from say 25MHz to 200MHz will do the trick, but the battery power of the device will last for only a day instead of a week!
What Are The Different Types of VMs?
The alternative approaches to VM implementation can be plotted on a time line illustrating the evolution of VMs (see Figure 1):
Many JVMs employ pure software interpretation. A good example of this is the SUN KVM. Table 1 outlines its major characteristics.
When performance becomes a problem, the interpreters can be optimized in order to improve performance (see Table 2). Many devices that use a pure interpreter now use this method as a way of squeezing out that extra performance, and it does have a significant effect.
The optimized software must exist for your hardware, OS, and JVM.
The Just-in-Time (JIT) approach to Java execution does as its name suggests. It compiles the Java byte code into native processor code as the application is run.
A common misconception with JIT is that the result is an application that will run as quickly as an equivalent native C application once the compilation is complete. This isn't the case as the application is still a Java application and still runs within the Java domain.
Depending on the implementation, it may compile the byte code at different stages of execution and may compile the code using different models. We should also remember that the rest of the Java subsystems still run, such as the garbage collector, and threading kernel.
While JIT does bring big improvements in speed, it unfortunately also carries two main risks: code bloat and compile stalls.
First, depending on the approach taken, at some time the executing block of Java byte code will need to be compiled into native processor instructions. This obviously requires processing power. (Improvements will be possible via pre-JIT to reduce these risks; however, Ahead-of-Time disadvantages may be introduced.)
Second, once the code has been compiled, it needs to be stored. On a mobile device this may mean Flash ROM or even RAM. As Java byte code is abstract, it can represent more functionality in less space than its native processor equivalent instructions - especially in the case of a RISC processor. This code bloat can be anything - up to a factor of 10. Another problem with JIT is the fact that code will be compiled into native instructions, even if that block of code is seldom executed.
Again, depending on the device, processor, and implementation, this effect may not always be noticed by the end user, depending how well it's been hidden (see Table 3).
A good way of speeding up a JVM is to use ahead-of-time compilation. This has the added benefit of reducing compile stalls on the device, and can be performed for the bottlenecks to control code bloat. However, the main disadvantage of this is that separate binaries will have to be maintained for different platforms - the platform-independent advantages are lost. Alternatively, only the device's "core" applications can be compiled ahead-of-time, while any dynamically downloaded programs could run interpreted.
As with JIT technology, a common misconception with ahead-of-time is that the application will run as quickly as an equivalent native C application. This isn't the case as the application is still a Java application and, once again, still runs within the Java domain (see Table 4). The rest of the Java subsystems still run, such as the garbage collector and threading kernel.
To reduce the code bloat risk associated with regular JIT, Smart JIT can be used (see Table 5), which compiles only the parts of the program that prove to be the bottlenecks. Furthermore it may limit the amount of complied code stored by throwing away old native code in order to create space for newly compiled blocks. This method can bring some limited code bloat, compile stalls, and profiling stalls - depending on the approach taken. Whatever approach is adopted, the extra effort is paid for in one way or another.
An accelerator is a hardware solution that typically "bolts on" to the side of an existing heavyweight processor. The accelerator can't execute anything on its own. Rather it can be thought of as a hardware Java adapter that uses a heavyweight processor to speed up the Java execution. It effectively means that the main processor can be used to execute Java byte codes by using the processor's native instructions to microcode the Java byte code.
The use of a hardware accelerator means there is no code bloat as the native instructions are never stored. It also means that there are no software compile stalls, as the Java byte code to native instruction translation is done in the hardware.
The software-based JVM is still required. The accelerator vendor typically modifies this so that it uses the hardware accelerator in place of its main interpreter loop and byte code execution unit.
The following describes how an accelerator can be used on a "two-sided" smartphone, where one side is used for the 3G baseband critical tasks, and the other is used for the PDA-type noncritical applications.
Single Chip Solution
In this configuration the baseband chip that manages the 3G protocol stack is also used to execute the Java subsystems and Java applications (after being translated into native instructions by the accelerator). This means the main chip is used to run the following:
- Baseband tasks + original phone software
- Java subsystems
- Java applications
Dual Chip Solution
It's important to remember that this may be contained on a single piece of silicon.
In order to prevent any degrading in the performance of the main baseband processor (which we shall refer to as the Master), we have introduced another processor to take on the work of the PDA (we shall refer to this as the Slave). This circumvents any performance, stability, or security problems on the more "critical" side of the phone. The Master and Slave typically communicate via a system bus, such as a VCI compatible bus.
While hardware acceleration is a much improved method of running Java byte code, there are a number of potential issues with this approach:
- Master processor work load increase
- Licensing cost of extra heavyweight native processor
- Silicon cost
Master Processor Work Load Increase
First, if the accelerator is bolted on to the side of your existing processor, then it will have to also carry the burden of a Java Virtual Machine in addition to its normal duties. Typically, the accelerator looks out only for Java byte codes. When it "sees" one, it does a processor context switch into "Java mode," then pumps the native instructions into the regular processor for execution. The regular processor still needs to do the actual "grunt" of the processing.
In addition to this, the native processor also needs to run the Java subsystems, such as garbage collector, threading kernel, dynamic class loader, and verifier. Again this requires processing bandwidth on behalf of the main processor.
If the dual-chip solution is used so that the main processor (Master) doesn't have to take on the extra workload, then an additional (Slave) processor will be required - the one to which the accelerator will be bolted (see Table 6). This will obviously incur additional licensing costs, which, in the worst case, may be double.
If the dual chip solution is used, then the additional processor will mean that the silicon cost will also increase. Again, at worst, this may be double.
Native Java Processors
The next logical step for Java is to use native Java processors, where the native instructions are in fact Java byte codes. This has the advantage of executing actual byte codes in a similar fashion to how a native processor executes its instructions, which means that no translation or interpretation of Java byte codes to another machine instruction set is required. Typically, the processor directly executes a core set of byte codes. This, in effect, means there is a core set of byte codes that are true native machine instructions (see Table 7).
Other more complex instructions are implemented by using the core set of byte code machine instructions as microcode, and the few high-level byte codes are executed by firmware, which uses both the microcoded byte codes and the core native byte codes. Any improvements in the core native byte codes have an immediate effect on the higher-level byte codes, which presents a number of interesting optimization opportunities.
Looking back at our example of the two-sided smartphone, we use a Master heavyweight processor to process the critical heavy real-time 3G communication tasks, and a lightweight Slave Java native processor for the PDA side of things.
It's probably a fair statement to say that the 3G mobile communications processing will only become more complex and time critical as capabilities of the devices and networks increase. Therefore, this processor can be left alone to do its intended job of managing mobile communications.
The Slave Java processor is used to provide the processing power for the PDA applications.
As with the other VM options, native Java processors can also be field upgraded. One drawback of a native Java processor is that if any legacy native code needs to be run then this can only be executed on the Master processor as the Java chip can only understand Java code.
Which Way to Go?
A JVM may provide acceptable performance for a powerful desktop system running at a high clock speed with megabytes of RAM. However, when faced with the constraints of an embedded environment, it may well falter when faced with, for example, 1 megabyte of RAM, no virtual memory, no hard drive, and a very low clock speed.
As we can see there are a number of alternatives when considering a JVM for an embedded environment. Still, the decision as to which way to go depends on many factors, including the device, processor, clock speed, available RAM, sophistication of user applications, and the required usability levels.
In this article, we've looked at the software techniques that have become more widely used in embedded environments with successful results, and at hardware alternatives that are designed to provide the next generation in performance while keeping memory and processing power to a minimum.
In terms of hardware-based VMs, accelerators are the next step toward this "next generation." Native Java processors are a generation further on and are regarded as the ultimate goal. This will open up a new set of exciting possibilities for mobile, wireless, and embedded applications.
Glenn Coates has been a software engineer for nine years. For the last four years he has worked with mobile devices and Java developing products, such as smartphones,
microbrowsers and digital set-top boxes. Glenn holds a degree in computer science and is also a Sun-certified architect for Java technologies. He works for Vulcan Machines as a VM architect developing a Java native processor called Moon. See www.vulcanmachines.com