HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML

JNI Programming In C/C+, by Blair Wyman

If you're familiar with the Java Native Interface (JNI), as this article presumes, you know that it's tailored primarily for C and C++ programmers. Compile-time support for JNI in these languages comes straight from the Sun specification, and is frankly a work of art.

The architects of the JNI had a terrifying three-part task: to tame the hydra of platform-specific issues inherent in so-called "native" code, provide a way to use native code in Java, and to do so in as "portable" a fashion as possible. The ubiquity and standardization of C and C++ made them the natural choices for preferred native languages, and their affinity to Java is apparent to anyone who has programmed to the JNI.

If you're familiar with IBM's e(logo)Server iSeries machine, you know it supports a wide range of programming languages, including:

  • C (in several incarnations)
  • C++ (quite recently)
  • RPG (Report Program Generator)
  • COBOL
  • CL (Command Language, the iSeries' workhorse)
  • qsh (POSIX-based scripting language)
  • MI (Machine Interface assembler language)
  • Perl
In contrast to most UNIX-based machines, however, the C language has never been at the foundation of the iSeries machine's operating system; in fact, C is a relative newcomer to the platform at the applications level. The internal development language has varied over the years, but if any language deserves the foundation role on iSeries, from an applications-development standpoint, it's Report Program Generator (RPG).

The heritage of today's iSeries machine dates back quite far, by modern standards. You probably recognize the most recent progenitor of the iSeries platform, the IBM AS/400 System announced in 1988, but it's actually the IBM System/38 that deserves the credit for being the seminal implementation of this truly unique computer architecture.

The unique features of the iSeries architecture are too numerous to recount here, but a couple of salient points will motivate the discussion that follows.

The original architects of the iSeries chose not to differentiate between "working storage" and "disk storage"; it's logically just one big pool of something colloquially known as Single Level Store (SLS). One advantage of this scheme in a database-centric system is clear: process-specific data movement is all but eliminated. Data can be shared among processes, without the overhead of ensuring that process-specific virtual address images of the data are seen in a consistent form.

Obviously, to accomodate for future growth, this SLS scheme requires a very large address space since everything on the machine is in one big "addressable universe." To satisfy this requirement, the designers of the System/38 specified a 16-byte (that's byte, not bit) pointer. Well, that was over 20 years ago, and the 16-byte pointer has so far stood the test of time.

Sharing a single address space across the entire machine? How can there be any hope of security between processes and users? The answer is in the notion of the pointer being "tagged." If a pointer operation is performed by the machine, then the pointer is tagged (i.e., made valid). If any sort of operation, other than a pointer operation, changes the storage, the tag is cleared and the pointer is invalidated.

As you can imagine, pointers of this size are...um..."unusual," even in today's diverse mix of systems. It's the very size of our 16-byte pointer that's probably the most cantankerous issue when porting Java native method code to the iSeries machine, as we'll see.

So, how could this unusual memory architecture even hope to support something as modern as Java? Especially, how could it support Java well enough to allow the iSeries (then AS/400) JVM to capture the top spot in four separate industry-standard Java benchmarks last summer? One part of the answer is simple, yet profound: Java doesn't know about "pointers" and "addresses." Hallelujah. With the introduction and increasing popularity of Java, the iSeries is on level ground from a programming standpoint.

However, as soon as we start talking about JNI and native code, we're talking mainly about C and C++; pointers and addresses are back in the spotlight. And, of course, there are other wrinkles. The iSeries is basically an EBCDIC machine, but Java and the JNI prefer UCS-2 or UTF-8 character encodings. Most machines run the JVM at a sort of "application-level," while the iSeries effectively embeds the JVM into the operating system.

There are five basic issues to consider when porting native method code to the iSeries machine:

  1. JNI reference types (e.g., jobject, jclass, jstring, etc.) are implemented as 32-bit signed integers, instead of pointers.
  2. In C and C++ source code, literal strings and character values are encoded in EBCDIC, by default.
  3. The JNI jlong data type is implemented on the iSeries using a C struct of two integers, instead of a true 8-byte integer.
  4. The simple pointer type in C and C++ is 128 bits, as discussed earlier.
  5. Direct addressability into the garbage-collected heap is never permitted.
Let's examine each of these in turn:

1. Object references are integers
This is a nice place to begin, since it's only a minor factor in the porting effort, but brings up some fundamental iSeries issues right away.

The iSeries pointer in C and C++ is a different sort of animal, as we saw earlier, and this functionality comes with a price: manipulating pointers is inherently expensive on the iSeries.

When the JNI specification came out, all the object references were implemented by Sun as pointer types. However, Sun's object reference pointers are "opaque" - there was no "layout" defined at the end. Since these pointers are truly just handles, we asked Sun for permission to implement our object references as integers instead of pointers.

The main impact of this change occurs only when compiling C++ native method code. The JNI specification for C++ uses a class hierarchy to allow some type-conformance checking to occur at compile time. For instance, in C++ you would never be allowed to assign a <code>jString</code> object into a <code>jBooleanArray</code> variable without an error message. When the references are integers, this type-conformance checking is lost. A minor impact is that the null reference becomes the integer zero, instead of a null pointer value.

2. Codepage considerations
The iSeries is an EBCDIC machine, and Java's Unicode heritage is ASCII-based, so there's a sort of conflict between Java's native side and the iSeries. This conflict is mostly avoided when you stick to Java, but once you start writing native method code, Java's wonderful platform-neutrality is necessarily reduced.

The JNI expects all character data to be provided either in Unicode (the <code>jchar</code> type), or in the modified UTF-8 representation of Unicode (where the NUL character is represented using the two-byte encoding of 0xc080).

UTF-8 has the interesting property of being, itself, composed partly of the 128 characters of the ISO 8859-1 Latin character codepage. That is, all ASCII character arrays are already UTF-8 - a quality that simplifies coding native methods on ASCII platforms.

This issue is really twofold, depending on whether the character data is literal or dynamic.

Literal Strings
Using the ILE C and C++ compilers for the iSeries system, the default encoding for literal character data is EBCDIC. However, this compiler default can be overridden using a #pragma statement in the source code.

#include <stdio.h>
#include <jni.h>
/* literal strings here are still EBCDIC */
char *fmt = "%d:%s\n"; /* EBCDIC */
char *message = "Return code non-zero!\n"; /* EBCDIC */
#pragma convert(819)
/* starting now, all literal strings are ASCII (819) */
/* NOTE: ASCII is NOT exactly the same as UTF-8! */
void main(int argc, char **argv) {
int i, rc;
for (i=0;i<argc;++i) { printf(fmt,i,args[i]); }
rc = asciiFunction("Great googly-moogly!"); /* ASCII */
if (rc) printf("%s", message); /* OOPS!!! Error! */
} /* Formats MUST be */
/* in EBCDIC ONLY */
#pragma convert(0)
char *backInEbcdic = "I am EBCDIC, hear me roar";

Since JNI requires UTF-8 input, it would naturally be wonderful if we could just specify some value nnn in our #pragma convert statements that would magically transmogrify the literals into a UTF-8 encoding. Unfortunately, although the iSeries supports UTF-8 for dynamic string conversions (as we'll see in a moment), it does not yet support the use of UTF-8 in the #pragma convert statement.

Instead, a value of nnn that most closely fits the bill is codepage 819, which is ISO 8859-1 Latin. The low-order 128 characters of this codepage are identically UTF-8.

We have to be careful, though, to make sure that all literal strings contain only invariant ASCII - any use of an 8-bit character whose most-significant bit is set requires that the character be represented in its two-byte UTF-8 form. Fortunately, most uses of ASCII for JNI parameters such as class and method names are already likely to be UTF-8. Literals containing character values greater than decimal 128 can be represented using escapes in the source string, as this example shows:

/* \u00EB is 'ë' -- the ISO 8859-1 Latin 1 code for (e-diaresis)
* The UTF-8 value for this character is the two-byte sequence 0xC3AB
*/
p = "Zo\xc3\xab"; /* UTF-8 for Zoë */
Dynamic Strings
In contrast to literal strings, the value of dynamic strings is not known until runtime. These strings may become parameters to the JNI, and therefore might have to be converted to "modified UTF-8." The standard API for the purpose of codepage conversion on the iSeries is <code>iconv()</code>. The iconv() API accepts two codepage numbers - one for the "from" codepage, and one for the "to" - and supports both UCS-2 and UTF-8 codepages. However, be forewarned. In the (admittedly unusual) case where your C string might contain an embedded NUL character, the UTF-8 conversion of iconv() preserves it. Remember, JNI takes "modified UTF-8," where the NUL is a two-byte character.

3. The jlong data type
The <code>jlong</code> data type is the JNI equivalent of the Java <code>long</code>, an 8-byte integer value. At least it's supposed to be

When the JNI was first being implemented on the AS/400 system, the C language did not have an 8-byte integer value (although COBOL did, ironically). So, the only alternative for our implementation was to define the <code>jlong</code> as a structure, with two 4-byte integers.

Now, when the 8-byte integer came to the C compiler on the AS/400, we had already shipped a release of JNI using the structure form of <code>jlong</code>, and we realized that if we changed our support to use the 8-byte integer form, we would be sacrificing forward-compatibility in existing native method libraries. The iSeries team is loathe to introduce incompatibilities that break existing code. Therefore, despite the fact that the ILE C and C++ compilers have supported 8-byte integers for a couple of releases, the JNI is stuck with the struct form of jlong.

The impact of this fact on the porting effort can be widespread but generally straightforward. For example, consider this C native method code:

JNIEXPORT jlong JNICALL Java_TestJava_getlong
(JNIEnv *env, jobject obj)
{
long y;
somefunc(&y); /* pass y by reference to some routine */
return y; /* Uh-oh! y is not compatible with jlong */
}
The "fixed" version of this code looks like:
JNIEXPORT jlong JNICALL Java_TestJava_getlong
(JNIEnv *env, jobject obj)
{
jlong x; /* here is our compatible return value */
long y; somefunc(&y); /* pass y by reference to some routine */
JavaI2L(y,x); /* convenience macro from jni_md.h */
/* sign-extends if necessary */
return x; /* OK */
}

4. "Pointer hiding" in Java objects
One of the most difficult issues with porting native methods to the iSeries is the issue of "pointer hiding." An all-too-common idiom in native methods is to take the address of some native storage and to store that address in a <code>int</code> or <code> long</code> field of a Java object using JNI.

someObj *somePointer;
fid = (*env)->GetFieldID(env,clazz,"intField","I");
(*env)->SetStaticIntField(obj,clazz,fid,(int)somePointer);
As we discovered earlier, the iSeries pointer is 16 bytes in length, so it simply doesn't fit. And even if it did, the modification of the pointer as an integer would clear the tag, rendering the pointer invalid. So we see there's a problem.

There are two types of solutions - one that's robust and expensive, and one that's lighter in weight yet more complicated to get right.

The robust, heavyweight solution relies on two key facts:

  1. The iSeries defines a special class, com.ibm.as400.system.MiPtr, which can be used to hold 16-byte pointers,
  2. Object references on the iSeries are integers.
The scheme is simple. Each time the native code stores a pointer into a Java int or long field, change the code to create a com.ibm.as400.system.MiPtr object, store the pointer into the object, take a global reference to the object, and store that reference into the int or long field.

In order to use this scheme, you have to bind your native method library code with the QJVAJNI service program, which exports two functions called "GetPointer" and "SetPointer". Here are the before and after pictures:

  • Before:
    void *ptr; /* pointer we want to store in the int field */
    jclass otherClz; /* class containing static int field */
    jfieldID otherFid; /* fieldID of the static int field in otherClz */
    /* DO NOT try the following at home */
    (*env)->SetStaticIntField(env,otherClz,otherFid,(int)ptr); /* UGH */
  • After:
    void *ptr; /* pointer we want to store in the int field */
    jclass otherClz; /* class containing static int field */
    jfieldID otherFid; /* fieldID of the static int field in otherClz */
    jclass pClz; /* class ref to com.ibm.as400.system.MiPtr */
    jobject pObj; /* object ref to instance of MiPtr we create */
    jmethodID pCtor; /* method ID of the MiPtr constructor <init> */
    jobject gRef; /* global reference to new instance (why?) */

    /* create instance of com.ibm.as400.system.MiPtr */
    pClz =(*env)->FindClass(env,"com/ibm/as400/system/MiPtr");
    pCtor=(*env)->GetMethodID(env,pClz,"<init>","()V");
    pObj =(*env)->NewObject(env,pClz,pCtor);
    SetPointer(pObj,&ptr);   /* NOTE: pass ADDRESS of ptr */
    gRef =(*env)->NewGlobalRef(env,pObj);
    (*env)->SetStaticIntField(env,otherClz,otherFid,(int)gRef); /* OK */
The lightweight solution is more complicated to get right. Basically, you just allocate an array of pointers in C static storage, store the pointer into the array, and then store the integer array index into the Java int or long field.

However, now we have the issue of possibly synchronizing many threads, since the allocation of a "slot" in this table of pointers must be performed atomically. Also, how do we know when the pointer should be cleared from the table? A deep understanding of the application will be required to get this right.

5. Direct addressability to objects in the garbage-collected heap
One aspect of the iSeries JVM implementation is its integration with the underlying platform. This integration is tight enough to allow Java-to-Java calls to be made very quickly using a "machine addressing" scheme that avoids the overhead of 16-byte pointers. However, as soon as native method code comes into the picture, so do the 16-byte pointers.

Selected JNI interfaces are designed to provide addressability to the elements of a Java array, or to the characters of a Java String object. For instance, the <code>Get<PrimitiveType>ArrayElements</code> or <code>GetStringChars </code> interfaces return an address to the caller: an address that may or may not be directly addressing the actual object.

On most platforms, these addresses actually point directly into the Java object in the heap. On the iSeries system, however, the JNI programmer is never provided with such direct addressability - these JNI interfaces always return the address of a copy of the data. Even the so-called "critical" interfaces added to the JNI in Java 2 - GetPrimitiveArrayCritical and GetStringCritical - return the address of a copy of the data.

This decision not to expose the garbage-collected heap to any user-level addressing in the iSeries JVM was not made lightly, but it is incontrovertible. If a JNI programmer could get the address of an object in the heap, it would be possible for that code to accidentally walk over storage and corrupt the state of the JVM. This restriction is one clue to the robust nature of the iSeries JVM.

Summary
JNI programming on the IBM iSeries system poses some extra challenges to the programmer, but none that are insurmountable. If the programmer is lucky enough to be dealing with a well-designed body of native method code, code that avoids some of the pitfalls we've discussed, the port to the iSeries is a relative breeze.

Mr. Wyman can be contacted at: [email protected]

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.