Java, in its J2ME guise, has all the attributes of a first-rate
platform for embedded system design. More specifically, its platform
independence, code portability, and robust operation render it
particularly suited to such applications. The extensive use of
embedded Java-based devices in the future is secure due to the
proliferation of standards based on it, and, moreover, the
endorsement of major OEMs committed to its use in their designs.
It's become clear that the potential marketplace for embedded
Java devices is vast, but that some of these markets are not yet
mature. Successful manufacturers in the immediate market for embedded
devices, such as wireless handsets and set-top boxes, possess a huge
investment in legacy code that they, not unreasonably, wish to
retain. Along with the problem of generating acceptable performance
in resource-constrained environments, the migration to Java-enabled
devices in markets that are already established and based on other
technologies is the most significant barrier to the widespread
adoption of Java as the de facto standard in the embedded space.
Of the emerging solutions, both hardware- and software-based,
none can claim to be a panacea. This article discusses the
introduction of Java into multilanguage heritage designs, focusing on
the advantages and disadvantages of deploying each solution.
Java Bytecode Execution in an Embedded Environment
Obviously, some platforms will be more proficient than others
at executing Java code. The issue is clouded by hype, but,
fundamentally, Java bytecode can be executed in one of three ways:
software translation, hardware translation, or direct execution.
Translation in Software: The Java Virtual Machine
Bytecode can be executed using a software Java Virtual
Machine (JVM) or, more specifically, a KVM designed particularly for
embedded devices. Java code can be executed on any such virtual
machine. A JVM takes the precompiled Java source code (bytecode) and
translates it into the native machine code of the processing platform
in question preceding its execution. Indeed, this process of
interpretation is central to the Java concept of platform
independence.
Translation in Hardware: Bytecode Accelerators
A bytecode accelerator is a hardware solution that uses the
resources of an existing host processor. Accelerator solutions don't
execute Java bytecode directly; instead, they convert the bytecode
(in hardware) into the native instructions of the host processor
prior to execution. Invariably, such solutions also utilize a
software-based JVM, modified by the replacement of the main
interpreter loop and execution unit with the bytecode accelerator.
Native Java Processors
Native Java processors are microprocessors designed to
execute bytecode directly as their native instruction set. They can
be deployed as a coprocessor to a host processor in a multilingual,
multiprocessor system, or as a standalone solution in a dedicated
embedded Java design.
Embedded Multiprocessor Java Solutions
While there's a clear desire for Java capabilities to be
introduced into many embedded applications, it's a prerequisite that
Java bytecode is executable in parallel with existing heritage code,
rather than in place of it. Primary examples of such applications
would be a mobile phone running a C-coded communications stack, or a
set-top box currently evolving to support interactive or
Internet-based content. Understanding the fundamental design issues
is vital when designing high-quality embedded devices for Java-based
applications. Characteristics that influence the selection of
components for any embedded system include:
- Resources
- Performance
- Ease of integration
- Cost
First, JVMs are inherently resource hungry. This is a
corollary of the software interpretation layer, which abstracts the
code, and the processor upon which that code is executed. A JVM will
typically map a single Java bytecode into several native processor
instructions prior to execution; therefore, to sustain acceptable
Java performance, a very fast processor is required. Relatively
speaking, the rise in silicon cost and power consumption intrinsic in
the use of such powerful processors is huge. Additional memory
resources, occupied by the JVM itself, present a further burden for
embedded applications.
Bytecode accelerators also use a JVM and so require the same
additional memory resources as software-only JVM solutions. Typical
bytecode accelerators are efficient in terms of silicon cost when
added to an existing host processor; however, if a second dedicated
processor is used, the gate count of this additional processor must
also be taken into account. Native Java processors vary drastically
in size. Those that are stack-based, and thus accurately match the
Java execution model, have a very low silicon cost, whereas those
based on a standard RISC processor are less than optimal.
Since there are still no dependable metrics available to
evaluate the performance of embedded Java solutions, code execution
speed remains an emotive issue. When applied prudently, benchmarks
are an invaluable asset. However, they're not the sole criteria for
evaluation and must be regarded with caution since ultimately the
crucial point is how fast the platform can execute the end
application code. CaffeineMark figures are widely quoted but are not
representative of real applications. It's hoped that the imminent
arrival of EEMBC industry-standard benchmarks will clarify the issue
as discussed in my previous article "J2ME Benchmarking: A Review"
(JDJ, Vol. 7, issue 1).
Generally speaking, solutions that rely on the translation of
the Java bytecode into one or more native instructions, by either a
hardware or software interpretation process, will execute code much
more slowly than solutions that are able to execute the bytecode
directly. Native Java processors can execute bytecode directly for
the vast majority of bytecode. More complex instruction types can be
microcoded (i.e., they follow a number of internally coded steps), or
else, when this is not practical, a jump to a predefined software
routine is invoked (see Figure 1).
Register-rich hardware solutions (e.g., bytecode accelerators
or, similarly, those native processors based on RISC cores) will
suffer a further performance impact resulting from the need to
preserve the state of the registers during the frequent context
switches that are a feature of a threaded language like Java.
JVMs are available for most processors and are the most
expedient way to enable Java capability on an existing platform.
However, this approach is wholly inefficient and not in any way
aligned with the J2ME paradigm, as the performance versus resources
trade-off in this case is difficult to justify for embedded devices.
Bytecode accelerators are specifically designed to operate juxtaposed
with a host processor and are relatively easy to integrate.
Furthermore, they're able to execute Java bytecode more rapidly than
the pure software JVM solutions they replace. However, this is still
at the expense of a reduction in the available bandwidth of the host
processor for other functions (e.g., communications for an
interactive application) as a result of the extra processing burden
placed on it.
Native Java processors can execute Java bytecode at optimal
speeds and do not place any extra burden on the host processor if
deployed as a coprocessor, since they can operate concurrently.
Taking everything into consideration, there's a clear migration path
(probably time-line dependent) from "easy-to-integrate" JVM solutions
through bytecode accelerators to the ultimate performance offered by
native Java processors.
How simple is it to integrate a native Java processor with an
existing host core? The answer, of course, depends on the design of
the processor. The final part of this article explains such a design
in more detail.
Finally, though licensing costs are somewhat tangential to
this discussion, they're worth a mention since it's an important
concern for devices that are produced in high volume. While cost is
very much a vendor-specific issue, it's worth pointing out that
solutions that utilize both a JVM and hardware intellectual property
will incur license fees for both resources.
Integrating a Native Java Processor into a Multiprocessor System
The integration of a Java processor as a loosely coupled
coprocessor can be simplified by the addition of a few extra
features, including:
- An industry-standard bus interface
- Relocation support for the core memory map
- Host processor communication support
Externally, the Java processor must present an
industry-standard bus interface (e.g., AMBA, AHB, MLB) to simplify
integration of the processor with the host CPU (see Figure 2). In a
coprocessor scenario, both processors are declared bus masters. Since
they're able to process data concurrently and are completely
independent of each other, conflicts may occur when both processors
request bus access simultaneously. Ultimately, in such circumstances,
the decision of which processor takes priority lies with the bus
arbiter and is defined by the systems integrator at design-time. Code
caches are an important feature of any coprocessor implementation.
Their importance lies in the fact that not only do they reduce code
access times, but they also limit system bus access and so reduce bus
contention.
By default, and upon reset, a standalone processor would
sensibly execute code from the first location in memory. However, in
a multiprocessor system, it must be possible to relocate the program
counter to allow the host to redirect the vectors for external
instructions to an appropriate location in the physical address map.
This could be achieved, for example, by reconfiguration of an index
register.
Low-level support must also be provided for interprocessor
communications. In the example described here, this is achieved using
two mailbox registers: one for communication from the Java processor
to the host, the other for communication in the reverse direction. A
command packet passed from the sending processor to its mailbox then
generates an interrupt to inform the recipient processor that a new
value has been written. Subsequently, a further interrupt would be
generated to inform the sending processor that the recipient has read
the value. It follows that the recipient processor is then able to
extract the format of the request by inspecting the mailbox, which
could be a method call, data transfer, or reference to a multimedia
object. Java coprocessor solutions that are currently market-ready
use one of two approaches to implement data transfer. This depends on
whether the processor requires dedicated memory resources or is able
to support shared access to system memory. Ideally, system memory can
double up as a communications area using an independent memory
location that's accessible to both processors to transfer data.
Otherwise, where the processor does not support shared memory, a FIFO
buffer can be used to provide a data transfer path, though this
increases the complexity of the design.
Ultimately, a J2ME application programmer shouldn't need to
care about the hardware resources and, indeed, from an abstract point
of view, there will be little or no difference between Java code
developed for single or multiprocessor solutions. As an example,
let's assume that the Java code wishes to make use of a set-top box
resource supported by the host processor, such as the tuner. This
resource would be accessible only via a Tuning API, such as the one
specified in the DVB Multimedia Home Platform standards. In this
scenario, a standard Java method could trigger a request (passed via
mailbox registers) to the host, passing arguments to indicate which
channel is required. Once the operation had been carried out, the
host would signal to the Java processor, again via a mailbox
register, that the request had been successfully completed (or
otherwise), and that the selected channel was available.
Similarly, the process of debugging Java application code is
as simple on a multiprocessor platform as it is on a single
processor. This can be accomplished using standard protocols, such as
the KVM Debug Wire Protocol (KDWP) to interface the Java processor
directly to a development and debug environment such as Forte. In
this instance, a JTAG port would be used to enable arbitrary
locations in memory to be written to (i.e., to send command packets)
or read from (i.e., to receive reply packets). Alternatively, debug
can be accomplished via the host processor, using mailbox registers
to enable communication between the two processors, as described
earlier.
Summary
This article discussed issues that pertain to the selection
of a Java solution for devices with a significant investment in
heritage code. Moreover, following a clear migration path from
virtual machines to embedded hardware solutions, the article also
discussed the practical implementation of a dedicated hardware
coprocessor solution. It's probable that all the solutions described,
from the easiest to integrate to those offering the ultimate
performance, will be deployed in multilingual, multiprocessor systems
long before single-language devices are upon us.
Author Bio
Dr. Carl Barratt works in the applications department of Vulcan
Machines Ltd. He has over eight years of experience in various
hardware and software design roles. Carl holds a degree in electronic
engineering and a doctorate from the University of Nottingham, UK.
carl@vulcanmachines.com