The Austin Java User Group recently sponsored a contest to create the
smallest Java Hello World! program. The rules were simple: create the
smallest Java class that when executed will display the text "Hello
World!" (and only that text) to the console.
The restrictions were that the class must execute under Sun's
1.3 JRE. It may make use of any class or file distributed with the
JRE, but any additional files (excluding arguments on the command
line) count against the byte total of the Java class file.
In this article, I explain how I arrived at my 70-byte
solution. I hope that in the process you learn a bit about Java class
files and the Java Virtual Machine. I also urge you to take the
challenge before viewing my solution.
Getting Started
Let's first look at the canonical Java Hello World! program.
public class Hello
{
public static void
main(String[] args)
{
System.out.println
("Hello World!");
}
}
Compiling this class with javac produces a 416-byte Java
class file. That's quite a few bytes just to print "Hello World".
Javac generates debugging information by default. Debugging can be
disabled with the "-g:none" option to reduce the byte count to 336
bytes, but to go much further we have to look at exactly where those
bytes are going.
Figure 1 shows the basic components of a Java class file. In
the initial Hello World program, as with most Java classes, the bulk
of the bytes come from the constant pool. The method declarations are
the next largest chunk, but most of the information used in a method
declaration is stored in the constant pool. Table 1 shows the
constant pool from the first class.
Note that the constant pool contains most of the details of
our code. The class name and method names are all there as are the
names and types of all the external classes, methods, and variables
we touch. Constant values (such as our "Hello World!" text) are
included too. Also, note that constant pool entries link to other
constant pool entries. For example, a method reference entry links to
a class reference entry (which in turn references a UTF8 text entry
holding the name) and a name and type entry (which links to the UTF8
text entries holding the name and the method signature).
To realize how this is used, consider the Hello class
expressed as bytecode with constant pool references indicated by the
number sign ("#") and the constant pool index number. Don't be scared
off by the bytecode. It's not as complicated as it looks. If you've
ever programmed an assembler of any sort, this should look quite
natural. If not, don't worry. The important thing to note is that in
terms of size, the actual bytecode is very small (the constructor is
5 bytes and the main method is 9) because almost everything we do
references a field or class or constant defined in the constant pool,
which is quite large (see Listing 1).
First Steps
To have maximum control over what goes in the class file I
decided to generate the class file directly. Normally a tool such as
Jakarta Bytecode Engineering Library (BCEL) would be the best choice
to generate the class, but in this case I wanted maximum control (and
understanding) of each byte that goes into the class file.
My first attempt was to simply generate a basic Hello World
program, minimizing constant pool references and removing any
unnecessary portions of the class. There are three main steps.
First, I wanted to make sure I generated only the main
method. When compiling a class, the Java compiler will insert a
default constructor if you don't specify one. However, this is only
necessary if you need to create an instance of the class. If you just
need to use the main method, you don't need to be able to create an
instance and can remove the constructor.
Next, I wanted to inherit from an already referenced class.
Every class referenced in the class file requires entries in the
constant pool. I can remove the java.lang.Object constant pool
reference I would normally get (remember, even if you don't specify
it in your Java source code, your class extends java.lang
.Object) by specifying some other class already referenced in the
class file. The choices were java.io.PrintStream and
java.lang.System. (java.lang.String is used as a parameter only so we
don't have class information for it in the constant pool.) Since
java.lang.System is final and cannot be extended, the choice was
java.io.PrintStream.
Finally, since the name of the class is also stored inside
the class file, instead of naming the class "ReallySmallHelloWorldClass", I wanted to choose a name whose text is already in
use in the constant pool. Some choices were "print", "out", and
"main", the names of methods and fields referenced. I chose "Code",
which is the constant pool tag associated with the Code attribute in
the method. The Code attribute is where the bytecode that's
associated with the method is stored.
Code to generate this first class is in GenClass1.java. (The
source code and Listing 1 can be downloaded from below.)
The resulting class file is 248 bytes. The following
code snippet shows the bytecode, and Table 2 shows the constant pool.
public class Code
extends java.io.PrintStream
{
public static void
main(String[] args)
{
;; System.out.print
;; ("Hello World")
;; get System.out
0: getstatic #13
;; get the String
;; "Hello World!"
3: ldc #18
;; invoke print method
5: invokevirtual #16
8: return
}
}
Hello Command Line
Next, remove the "Hello World!" from the constant pool and
pass it in as an argument on the command line. It takes 3 bytes
(aload_0, iconst_0, aaload) to reference args[0] as opposed to 2
bytes to load a constant (ldc) string from the constant pool;
however, not needing to store the text in the constant pool frees up
two constant pool slots and brings the total size down to 231 bytes.
Code to generate the class is in GenClass2.java.
The constant pool is similar enough to the first example that
we can skip it, but keep in mind that some of the positions in the
constant pool have changed. The following is the new bytecode.
public class Code
extends java.io.PrintStream
{
public static void
main(String[] args)
{
;; System.out.print(args[0])
;; get System.out
0: getstatic #12
;; args variable
3: aload_0
;; constant int value 0
4: iconst_0
;; get args[0]
5: aaload
;; invoke print
6: invokevirtual #15
9: return
}
}
sun.misc.MessageUtils
Even at 231 bytes the class file is still quite large. Most
of the bloat is associated with retrieving the static field out on
java.lang.System and invoking the print method on
java.io.PrintStream. With that in mind, I scoured the JRE-provided
classes for code that would either get System.out for me or print
some given text to stdout. Fortunately, there is such a class,
sun.misc.MessageUtils, that provides a static method "toStdout" that
will print a string to System.out. Using this, I can replace the
static field reference (System.out) and the method invocation
(java.io.PrintStream.print) with one single static method invocation
(sun.misc.MessageUtils.toStdout). Of course, since
java.io.PrintStream is no longer in the constant pool, a new
superclass is needed. Fortunately, the MessageUtils class is now
available to take on this job. Code to generate this class is in
GenClass3.java. The resulting class file is 171 bytes. The following
code snippet shows the bytecode, and Table 3 shows the constant pool.
public class Code
extends sun.misc.MessageUtils
{
public static void
main(String[] args)
{
;; toStdout(args[0])
;; args
0: aload_0
;; constant 0
1: iconst_0
;; get args[0]
2: aaload
;; invoke toStdout
3: invokestatic #9
6: return
}
}
Goodbye Main
At this point, I began to lament the size of the signature of
the main method - "([Ljava/lang/String;)V". I decided to try removing
the main method entirely and echoing the text in a static initializer
block. "<clinit>", the internal name for the static initializer, is a
few bytes longer than "main", but the size of the static
initializer's method signature "()V" is much shorter than main's. For
this to work, I needed to find a class with a main method that
doesn't echo any text to the console to extend, so we still have an
accessible main method.
The shortest named one I found among the various JRE classes
is sun.Applet.Main, the main program for the Java applet viewer
application. Applet viewer requires a command input argument, but we
can pass in the class file name "Code.class" as a command-line
argument. Applet viewer will silently ignore the input since it
contains no applet tags. The only drawback to this is that "Hello
World!" had to go back into the constant pool, bringing the solution
up to 194 bytes (see Table 4). Code to generate this class is in
GenClass4.java.
public class Code
extends sun.applet.Main
{
static {
;; sun.misc.MessageUtils
;; .toSdout("Hello World!")
;; get String "Hello World!"
0: ldc #9
;; invoke toStdout on
;; sun.misc.MessageUtils
2: invokestatic #10
5: return
}
}
OPC - Other People's Code
Despite making the class file larger, this was still an
important step. What I needed was to find a way to further leverage
hidden classes in the JRE. I'd already found a class with a main
method to use that did nothing, allowing the static initializer to do
its magic, but what I really needed was a class with a main method
that would just print out "Hello World!"
Apparently, that's not such a far-fetched idea. Nestled deep
within Sun's 1.3 JRE is sun.security.util.PropertyExpander with the following method:
public static void
main(String args[])
throws Exception
{
System.out.println(
expand(args[0]));
}
The expand() method doesn't alter the text "Hello World!", so
we're effectively just printing args[0] by itself. As long as we pass
in the text "Hello World!" as the first argument to the Java program,
as we were doing earlier, we're all set. Since there are longer
methods with bytecode associated with the class, the text "Code" is
no longer available as a class name. Unfortunately, there are no
other text fields to use in the constant pool. Since every class must
have a superclass, and a class cannot be its own superclass, we have
to add an entry to the constant pool for the name. However, it turns
out that the Sun JVM allows for a class to have a zero length name,
allowing us to keep the new constant pool entry as small as possible.
If this weren't the case, we would have to choose a name like "a".
The code to generate this class file is GenClass5.java. The
resulting bytecode is 70 bytes, consisting of four constant pool
entries (two for the class spec and two for the superclass spec) and
no fields, methods, or attributes (see Table 5).
extends sun.security.util.PropertyExpander
{
}
Of course, we could throw out the generated class file
completely and simply invoke Java with the class directly.
java sun.security.util.PropertyExpander 'Hello World!'
However, this wouldn't be a legal submission, so the 70-byte
solution was the best one I could come up with.
Conclusion
Although hacking class files doesn't have much practical
relevance, I found the challenge to be quite a lot of fun. And, even
if you have never touched a class file or Java bytecode, the Hello
World problem is small enough that you should be able to gain a
better understanding of how the internals of Java work.
Acknowledgments
I'd like to thank Jeff Schneider and Momentum Software for
devising the Hello World problem and the Austin Java Users Group for
sponsoring the contest.
Resources
Java Virtual Machine Specification:
http://java.sun.com/docs/books/vmspec/
Jakarta Byte Code Engineering Library (BCEL):
http://jakarta.apache.org/bcel/
Engel, J. (1999). Programming for the Java Virtual Machine.
Addison-Wesley.
Author Bio
Norman Richards works at Commerce One Labs, the research division of
Commerce One. norman.richards@commerceone.com
Author's
Source Code For this article (~ 4.58 KB ~Zip File Format)
Author's Additional
Source Code For this article (~ 4.98 KB ~Zip File Format)