HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML

It could be argued that the clock speed of a given processing platform enables you to estimate the execution time of a user application running on that platform.

However, quoting figures such as MIPS (millions of instructions per second) are somewhat futile, since the execution of a specific number of instructions on one processor will not necessarily accomplish the same end result as that same number of instructions running on a different processor. It's the execution speed of a given set of instructions that's of greater concern when selecting an appropriate platform to run application code.

Clearly some platforms will be more proficient than others in this regard, though this is a difficult parameter to quantify since it's dependent to a large extent upon the application code in question. Benchmarking is the technique used to measure the speed at which a particular platform is able to execute code. Indeed, this is evident in the abundance of benchmarks available. Numerous examples of Java benchmarking are listed at http://www.epcc.ed.ac.uk/javagrande/links.html.

Benchmarks vary significantly in their complexity, but invariably they comprise a number of lines of code that, when executed on the platform being tested, generates a discrete value to use during its appraisal. This facilitates a comparison of the execution speed with similar platforms. Typically there are three types of benchmarks, which have inherited titles in accordance with their origin:

  • User
  • Manufacturer
  • Industry
User benchmarks are, as the name suggests, created by any individual with an interest in the field. Countless examples are available and characteristically they vary in quality; in the past benchmarks of this type have been very influential.

Market incentives have driven the introduction of manufacturer benchmarks; invariably these are written to benefit the platform in question and so can be disregarded unless used to facilitate the relative performance of platforms offered by that particular vendor.

Finally, the financial significance of benchmarking has resulted in the development of industry benchmarks, which are usually considered to be of high integrity. Such benchmarks are defined by an independent organization, typically composed of a panel of industry specialists.

Why Write a Paper on Java Benchmarking?
Results are published for multiple benchmarks and the primary issues can be clouded by hype; as a consequence the selections available to the end user are somewhat overwhelming. The crucial point is how well your code performs on the chosen system, so the question is: How do you identify a benchmark that best models your application? An understanding of benchmarks is vital to enable the user to select an accurate measurement tool for the platform in question and not be misled by the results.

The purpose of this article is to educate device manufacturers, OEMs, and, more specifically, J2ME development engineers, while at the same time resolving any remaining anomalies in a discipline that's commonly misunderstood.

What Is a Benchmark?
Fundamentally, a benchmark should incorporate programs that, when invoked methodically, exhaustively exercise the platform being tested. Implicit in this process is the generation of a runtime figure corresponding to the execution speed of the platform.

Benchmarks can be simplistic, comprising a sequence of simple routines executed successively to check the platform's response to standard functions (e.g., method invocation). Typically, both the overall elapsed time and that for each routine in isolation is considered; in the former case it's usual to assert a weighting coefficient to each routine that's indicative of its relevance in the more expansive context. Each routine should run for a reasonable amount of time. The issue here is an assurance that performance statistics are not lost within overheads at start-up. Benchmarks can also be more substantive; for example, processor-intensive applications can check multithreading by running several other routines simultaneously to evaluate context switching. Essentially there's no substitute for running the user's own application code on the platform in question. However, while this argument is laudable, it's beyond reasonable expectation that the platform manufacturer can implement this. To facilitate an accurate appraisal, it's vital that any standard benchmark utilized by competing manufacturers should mimic as much as possible the way the platform will ultimately be used.

The Advantages and Limitations of Benchmarking
Industry benchmarks are useful for providing a general insight into the performance of a machine. Still, it's important not to rely on these benchmarks since such a preoccupation distracts from the bigger picture. While they can be employed generally to realize the efficient comparison of different platforms, they have shortcomings when applied specifically. For example, one function may be heavily used in the application code when compared to another, or certain functions may run concurrently on a regular basis. There are inherent benefits in developing your own benchmark as this facilitates the tailoring of routines to imitate the end application or to expose specific inadequacies in peripheral support. Manufacturers' benchmarks can be written to aid the cause of specific vendors and so can easily be tailored to mislead.

When considering more restrictive embedded environments, such as those used by J2ME-compliant devices, it becomes apparent that the application developer must consider the risks inherent in the hardware implementation of a virtual machine prior to making a purchasing decision.

Speed is a primary consideration when adopting a JVM within restricted environments; implementations of the J2ME vary significantly in this respect, from JVMs that employ software interpretation and JIT compilers that compile the bytecode to target machine code while the application is being executed, to native Java processors offering much greater performance.

Other factors to consider include the response time of the user interface, implementation of the garbage collector, and memory issues since consumer devices don't have access to the abundant resources available to desktop machines. While this may seem a tangential point as far as benchmarking is concerned, it's one worth making since it's imperative that these areas in particular are comprehensively exercised. Subject to these caveats, benchmarking is a valuable technique that aids in the evaluation of processing platforms, and, more specifically, J2ME platforms.

Java-Specific Benchmarks
As with other platforms, numerous Java benchmarks have appeared (see Fig 2).

CaffeineMark is a pertinent instance of a benchmark since its results are among those most frequently cited by the Java community. On this basis we chose it as an example for further discussion.

CaffeineMark encompasses a series of nine tests of similar length designed to measure disparate aspects of a Java Virtual Machine's performance. The product of these scores is then used to generate an overall CaffeineMark. The tests are:

  • Loop: Employs a sort routine and sequence generation to quantify the compiler optimization of loops
  • Sieve: Utilizes the classic sieve of Eratosthenes to extract prime numbers from a sequence
  • Logic: Establishes the speed at which decision-making instructions are executed
  • Method: Executes recursive function calls
  • Float: Simulates a 3D rotation of objects around a point
  • String: Executes various string-based operations
  • Graphics: Draws random rectangles and lines
  • Image: Draws a sequence of three graphics repeatedly
  • Dialog: Writes a set of values into labels and boxes on a form
An embedded version of CaffeineMark is available that excludes the scores of the Graphics, Image, and Dialog tests from the overall score. Furthermore, CLDC doesn't support floating-point operations, so the "Float" test is ineffective in this context. This benchmark is regularly updated to account for vendor optimizations and continues to be a reasonably accurate predictor of performance for JVMs.

Bearing this in mind, alongside the high take-up of CaffeineMark in the industry, it's unfortunate that it's unsuitable for embedded environments such as J2ME. The cogency of this argument is based upon its inability to benchmark the interaction of Java subsystems, and the subsequent failure to imitate typical real-world applications faced by such devices. More specifically, it doesn't take into account certain situations in which a platform may have to cope with a heavily used heap, the garbage collector running all the time, multiple threading, or intensive user interface activities.

To address some of these issues, representatives of leading companies in the field have recently formed a committee under the banner of the Embedded Microprocessor Benchmark Consortium (EEMBC) to discuss the introduction of an industry benchmark for J2ME devices.

Table 1
Table  1: Examples of Java-specific benchmarks currently in existence

What Is EEMBC?
EEMBC (www.eembc.org) is an independent industry benchmarking consortium that develops and certifies real-world benchmarks for embedded microprocessors; the consortium is established among manufacturers as a yardstick for benchmarking in this context. A principal concern of the committee is to produce dependable metrics, enabling system designers to evaluate the performance of competing devices and consequently select the most appropriate embedded processor for their needs. The industry-wide nature of such committees intrinsically helps to combat the practice among some vendors of striving to artificially improve their ratings via special optimizations of the compiler, which is now so wretchedly prevalent.

A subcommittee was recently formed under the umbrella of this organization to develop similar benchmarks for hardware-based virtual machines. Founding companies within the consortium include Vulcan Machines Ltd, ARM, Infineon, and TriMedia. Primarily the committee aims to identify the limitations of existing Java benchmarks, and to develop new ones in which "real-world" applications are afforded a higher priority than low-level functions.

An example benchmark conceived on this basis could be a Web browser. Since this is a very intensive end application in almost every respect, a figure relating to the proficiency of the device running low-level code in isolation wouldn't prove particularly representative of its functionality.

Consequently, the EEMBC consortium solution is expected to employ a series of applications reflecting typical real-world scenarios in which CDC- and CLDC-compliant devices can be employed. Further examples of such benchmarks include a generic game or organizer that exercises intensive garbage collection, scheduling, high memory usage, user interface, and dynamic class loading. This way system designers are able to evaluate potential devices for inclusion in their end application by the appraisal of a benchmark derived in an environment that's analogous to that application.

Other Considerations?
When applied prudently, benchmarks are an invaluable asset that aid in the selection of hardware to suit a particular application. However, they shouldn't be regarded as the sole criteria. It's imperative that J2ME-embedded system designers don't rely upon the use of benchmarks exclusively, since the issue is clouded by many other factors.

In the context of J2ME, systems extend beyond the virtual machine to its interaction with peripheral devices such as a memory interface; clearly such peripherals and the interfaces to them must be considered when measuring the time it takes to execute an application. In the case of memory, limitations will be imposed on a J2ME-optimized device; this raises numerous issues that may impact the performance of the device, for example, garbage collection.

Also, implicitly, batteries are employed to power hardware that's compliant with the CLDC specification. Consequently, power consumption of the virtual machine is of primary concern and, accordingly, the clock speed must be kept to a minimum. For example, it's pertinent here that while software accelerators may post acceptable benchmark scores, they may also, as a consequence of their reliance upon a host processor, consume excessive power compared to a processor that executes Java as its native language.

Another significant factor is the device upon which the virtual machine is implemented. The FPGA or ASIC process used will clearly affect the speed at which the processor runs, and variations in benchmark scores are a natural corollary of this. Furthermore, the silicon cost of the entire solution that's required to execute Java bytecode must be considered, particularly where embedded System-on-Chip implementations of the JVM are concerned. Similarly, the designer should be aware of fundamental issues such as the "quality" of the JVM in terms of compliance with the J2ME specification, reliability, licensing costs, and the reputation of the hardware vendor for technical support. All these factors must be considered in tandem with the benchmark score of the virtual machine prior to making a purchasing decision.

Conclusion
No benchmark can replace the actual user application. At the earliest possible stage in the design process, application developers must run their own code on the proposed hardware, since similar applications may post a significant disparity in terms of performance on the same implementation of the virtual machine. However, since designers are often focused on using their time more productively, they frequently rely upon industry benchmarks for such data. While there's no panacea, industry benchmarks such as that proposed by EEMBC are a useful tool to aid in the evaluation of performance, provided you're aware of its limitations in a J2ME environment.

Resources

  • Coates, G. "Java Thick Clients with J2ME." Java Developer's Journal. Vol. 6, issue 6.
  • Coates, G. "JVMs for Embedded Environments." Java Developer's Journal. Vol. 6, issue 9.
  • Cataldo, A. (April, 2001). "Java Accelerator Vendors Mull Improved Benchmark." Electronic Engineering Times.
Author Bio
Glenn Coates works for Vulcan Machines as a VM architect developing a Java native processor called Moon and has been a software engineer for nine years. For the last four years he has worked with mobile devices and Java developing products. He also represents his company at the EEMBC meetings. Glenn holds a degree in computer science and is also a Sun-certified architect for Java technologies. [email protected]

Carl Barratt works in applications support for Vulcan Machines. He has over seven years of experience in various hardware and software development roles. Carl holds a BEng (Hons) degree in electronic engineering and has undertaken PhD research at the University of Nottingham. [email protected]

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.