It could be argued that the clock speed of a given processing
platform enables you to estimate the execution time of a user
application running on that platform.
However, quoting figures such as MIPS (millions of
instructions per second) are somewhat futile, since the execution of
a specific number of instructions on one processor will not
necessarily accomplish the same end result as that same number of
instructions running on a different processor. It's the execution
speed of a given set of instructions that's of greater concern when
selecting an appropriate platform to run application code.
Clearly some platforms will be more proficient than others in
this regard, though this is a difficult parameter to quantify since
it's dependent to a large extent upon the application code in
question. Benchmarking is the technique used to measure the speed at
which a particular platform is able to execute code. Indeed, this is
evident in the abundance of benchmarks available. Numerous examples
of Java benchmarking are listed at
Benchmarks vary significantly in their complexity, but
invariably they comprise a number of lines of code that, when
executed on the platform being tested, generates a discrete value to
use during its appraisal. This facilitates a comparison of the
execution speed with similar platforms. Typically there are three
types of benchmarks, which have inherited titles in accordance with
User benchmarks are, as the name suggests, created by any
individual with an interest in the field. Countless examples are
available and characteristically they vary in quality; in the past
benchmarks of this type have been very influential.
Market incentives have driven the introduction of
manufacturer benchmarks; invariably these are written to benefit the
platform in question and so can be disregarded unless used to
facilitate the relative performance of platforms offered by that
Finally, the financial significance of benchmarking has
resulted in the development of industry benchmarks, which are usually
considered to be of high integrity. Such benchmarks are defined by an
independent organization, typically composed of a panel of industry
Why Write a Paper on Java Benchmarking?
Results are published for multiple benchmarks and the primary
issues can be clouded by hype; as a consequence the selections
available to the end user are somewhat overwhelming. The crucial
point is how well your code performs on the chosen system, so the
question is: How do you identify a benchmark that best models your
application? An understanding of benchmarks is vital to enable the
user to select an accurate measurement tool for the platform in
question and not be misled by the results.
The purpose of this article is to educate device
manufacturers, OEMs, and, more specifically, J2ME development
engineers, while at the same time resolving any remaining anomalies
in a discipline that's commonly misunderstood.
What Is a Benchmark?
Fundamentally, a benchmark should incorporate programs that,
when invoked methodically, exhaustively exercise the platform being
tested. Implicit in this process is the generation of a runtime
figure corresponding to the execution speed of the platform.
Benchmarks can be simplistic, comprising a sequence of simple
routines executed successively to check the platform's response to
standard functions (e.g., method invocation). Typically, both the
overall elapsed time and that for each routine in isolation is
considered; in the former case it's usual to assert a weighting
coefficient to each routine that's indicative of its relevance in the
more expansive context. Each routine should run for a reasonable
amount of time. The issue here is an assurance that performance
statistics are not lost within overheads at start-up.
Benchmarks can also be more substantive; for example,
processor-intensive applications can check multithreading by running
several other routines simultaneously to evaluate context switching.
Essentially there's no substitute for running the user's own
application code on the platform in question. However, while this
argument is laudable, it's beyond reasonable expectation that the
platform manufacturer can implement this. To facilitate an accurate
appraisal, it's vital that any standard benchmark utilized by
competing manufacturers should mimic as much as possible the way the
platform will ultimately be used.
The Advantages and Limitations of Benchmarking
Industry benchmarks are useful for providing a general
insight into the performance of a machine. Still, it's important not
to rely on these benchmarks since such a preoccupation distracts from
the bigger picture. While they can be employed generally to realize
the efficient comparison of different platforms, they have
shortcomings when applied specifically. For example, one function may
be heavily used in the application code when compared to another, or
certain functions may run concurrently on a regular basis. There are
inherent benefits in developing your own benchmark as this
facilitates the tailoring of routines to imitate the end application
or to expose specific inadequacies in peripheral support.
Manufacturers' benchmarks can be written to aid the cause of specific
vendors and so can easily be tailored to mislead.
When considering more restrictive embedded environments, such
as those used by J2ME-compliant devices, it becomes apparent that the
application developer must consider the risks inherent in the
hardware implementation of a virtual machine prior to making a
Speed is a primary consideration when adopting a JVM within
restricted environments; implementations of the J2ME vary
significantly in this respect, from JVMs that employ software
interpretation and JIT compilers that compile the bytecode to target
machine code while the application is being executed, to native Java
processors offering much greater performance.
Other factors to consider include the response time of the
user interface, implementation of the garbage collector, and memory
issues since consumer devices don't have access to the abundant
resources available to desktop machines. While this may seem a
tangential point as far as benchmarking is concerned, it's one worth
making since it's imperative that these areas in particular are
comprehensively exercised. Subject to these caveats, benchmarking is
a valuable technique that aids in the evaluation of processing
platforms, and, more specifically, J2ME platforms.
As with other platforms, numerous Java benchmarks have
appeared (see Fig 2).
CaffeineMark is a pertinent instance of a benchmark since its
results are among those most frequently cited by the Java community.
On this basis we chose it as an example for further discussion.
CaffeineMark encompasses a series of nine tests of similar
length designed to measure disparate aspects of a Java Virtual
Machine's performance. The product of these scores is then used to
generate an overall CaffeineMark. The tests are:
An embedded version of CaffeineMark is available that
excludes the scores of the Graphics, Image, and Dialog tests from the
overall score. Furthermore, CLDC doesn't support floating-point
operations, so the "Float" test is ineffective in this context. This
benchmark is regularly updated to account for vendor optimizations
and continues to be a reasonably accurate predictor of performance
- Loop: Employs a sort routine and sequence generation to
quantify the compiler optimization of loops
- Sieve: Utilizes the classic sieve of Eratosthenes to extract
prime numbers from a sequence
- Logic: Establishes the speed at which decision-making
instructions are executed
- Method: Executes recursive function calls
- Float: Simulates a 3D rotation of objects around a point
- String: Executes various string-based operations
- Graphics: Draws random rectangles and lines
- Image: Draws a sequence of three graphics repeatedly
- Dialog: Writes a set of values into labels and boxes on a form
Bearing this in mind, alongside the high take-up of
CaffeineMark in the industry, it's unfortunate that it's unsuitable
for embedded environments such as J2ME. The cogency of this argument
is based upon its inability to benchmark the interaction of Java
subsystems, and the subsequent failure to imitate typical real-world
applications faced by such devices. More specifically, it doesn't
take into account certain situations in which a platform may have to
cope with a heavily used heap, the garbage collector running all the
time, multiple threading, or intensive user interface activities.
To address some of these issues, representatives of leading
companies in the field have recently formed a committee under the
banner of the Embedded Microprocessor Benchmark Consortium (EEMBC) to
discuss the introduction of an industry benchmark for J2ME devices.
Table 1: Examples of Java-specific benchmarks currently in existence
What Is EEMBC?
EEMBC (www.eembc.org) is an independent industry
benchmarking consortium that develops and certifies real-world
benchmarks for embedded microprocessors; the consortium is
established among manufacturers as a yardstick for benchmarking in
this context. A principal concern of the committee is to produce
dependable metrics, enabling system designers to evaluate the
performance of competing devices and consequently select the most
appropriate embedded processor for their needs. The industry-wide
nature of such committees intrinsically helps to combat the practice
among some vendors of striving to artificially improve their ratings
via special optimizations of the compiler, which is now so wretchedly
A subcommittee was recently formed under the umbrella of this
organization to develop similar benchmarks for hardware-based virtual
machines. Founding companies within the consortium include Vulcan
Machines Ltd, ARM, Infineon, and TriMedia. Primarily the committee
aims to identify the limitations of existing Java benchmarks, and to
develop new ones in which "real-world" applications are afforded a
higher priority than low-level functions.
An example benchmark conceived on this basis could be a Web
browser. Since this is a very intensive end application in almost
every respect, a figure relating to the proficiency of the device
running low-level code in isolation wouldn't prove particularly
representative of its functionality.
Consequently, the EEMBC consortium solution is expected to
employ a series of applications reflecting typical real-world
scenarios in which CDC- and CLDC-compliant devices can be employed.
Further examples of such benchmarks include a generic game or
organizer that exercises intensive garbage collection, scheduling,
high memory usage, user interface, and dynamic class loading. This
way system designers are able to evaluate potential devices for
inclusion in their end application by the appraisal of a benchmark
derived in an environment that's analogous to that application.
When applied prudently, benchmarks are an invaluable asset
that aid in the selection of hardware to suit a particular
application. However, they shouldn't be regarded as the sole
criteria. It's imperative that J2ME-embedded system designers don't
rely upon the use of benchmarks exclusively, since the issue is
clouded by many other factors.
In the context of J2ME, systems extend beyond the virtual
machine to its interaction with peripheral devices such as a memory
interface; clearly such peripherals and the interfaces to them must
be considered when measuring the time it takes to execute an
application. In the case of memory, limitations will be imposed on a
J2ME-optimized device; this raises numerous issues that may impact
the performance of the device, for example, garbage collection.
Also, implicitly, batteries are employed to power hardware
that's compliant with the CLDC specification. Consequently, power
consumption of the virtual machine is of primary concern and,
accordingly, the clock speed must be kept to a minimum. For example,
it's pertinent here that while software accelerators may post
acceptable benchmark scores, they may also, as a consequence of their
reliance upon a host processor, consume excessive power compared to a
processor that executes Java as its native language.
Another significant factor is the device upon which the
virtual machine is implemented. The FPGA or ASIC process used will
clearly affect the speed at which the processor runs, and variations
in benchmark scores are a natural corollary of this. Furthermore, the
silicon cost of the entire solution that's required to execute Java
bytecode must be considered, particularly where embedded
System-on-Chip implementations of the JVM are concerned. Similarly,
the designer should be aware of fundamental issues such as the
"quality" of the JVM in terms of compliance with the J2ME
specification, reliability, licensing costs, and the reputation of
the hardware vendor for technical support. All these factors must be
considered in tandem with the benchmark score of the virtual machine
prior to making a purchasing decision.
No benchmark can replace the actual user application. At the
earliest possible stage in the design process, application developers
must run their own code on the proposed hardware, since similar
applications may post a significant disparity in terms of performance
on the same implementation of the virtual machine. However, since
designers are often focused on using their time more productively,
they frequently rely upon industry benchmarks for such data. While
there's no panacea, industry benchmarks such as that proposed by
EEMBC are a useful tool to aid in the evaluation of performance,
provided you're aware of its limitations in a J2ME environment.
- Coates, G. "Java Thick Clients with J2ME." Java Developer's
Journal. Vol. 6, issue 6.
- Coates, G. "JVMs for Embedded Environments." Java Developer's
Journal. Vol. 6, issue 9.
- Cataldo, A. (April, 2001). "Java Accelerator Vendors Mull
Improved Benchmark." Electronic Engineering Times.
Glenn Coates works for Vulcan Machines as a VM architect developing a
processor called Moon and has been a software engineer for nine
years. For the last four years he has worked with mobile devices and
Java developing products. He also represents his company at the EEMBC
meetings. Glenn holds a degree in computer science and is also a
Sun-certified architect for Java technologies.
Carl Barratt works in applications support for Vulcan Machines. He
has over seven years of experience in various hardware and software
development roles. Carl holds a BEng (Hons) degree in electronic
engineering and has undertaken PhD research at the University of