This article details the implementation of a tool called the Command Processor. This tool takes a Java object and creates a command-line interface to its public methods.
These public methods are essentially your Application Programming Interface (API). During the course of this article we'll get a good look at the java.lang.reflect package and a chance to kick the tires on the Regular Expression package included in the 1.4 JDK.
I often find myself with fresh code and no convenient way to try it out. The GUI is not ready or there is no requirement for one. Even writing the argument processing for a main function is often far more work than it's worth. I want to be able to work with my code without modifying the API or writing a throwaway UI. In the long run, all the solutions I've tried were either too much work or required significant modifications to the class. Simply put, I want the Command Processor to create a command-line UI for any given Java class. Here are my requirements, in order of importance.
Requirements 1 and 2 allude to the basic UI that's created for any given class. I want to be able to type in a method call, then examine and reuse the returned object. For example:
- Allow execution of API functions from a command line with parameters
- Allow the reuse of created objects (create and use variables)
- Be completely decoupled from the API that it uses (no code modifications to the processed object)
- Require minimal effort to utilize
- Provide certain built-in functions (list, exit, etc.)
- Restrict the API methods loaded
Cmd:> myvar = createUser "John", "\"fingers\" Doe", 'C', 34
User: John "fingers" Doe, Age 34, Clearance 'C'
Cmd:> othervar.addUser myvar
The entire syntax is described in the javadoc comments for the Command-
Processor class. It's very Java-like, but notice that parentheses are not required (they're actually not allowed) around the argument list. This makes it easier to handle argument casting later on. If you've ever tried writing a Lexer/Parser, you'll probably agree that parsing these lines by hand would be fairly difficult. Embedded quotes, for example, can cause a ton of grief to the programmer. Similar tools (DJava, for example) use a full-blown parser generator like Antlr or JavaCC to parse Java strings. I didn't need that level of sophistication, so I used the tools available to me in the 1.4 JDK. I chose to write a simple pseudo lookahead parser and found that the StreamTokenizer class gave me a good head start.
One of the keys to this tool (and requirement number 4) is that it must not be a burden to use. I want to create an instance of the Command Processor, hand it an object, and start working. As much as possible, I want to avoid configuration files and coded dependencies. Introspection comes to the rescue here as it allows us to examine the declared and inherited members (which include fields, constructors, and methods) of a Java class. It also allows us to fulfill requirement number 3, complete decoupling. As you'll see in the code API.java, the Command Processor and the class it processes (the processee, if you will) are completely unrelated. (The source code and Listings 1-5 can be downloaded from below.)
Since the first thing I'll have to do is parse command lines, we might as well start there. First, a couple of rough definitions: Lexers (sometimes called Tokenizers) take a stream of characters and turn it into a stream of tokens. Parsers take this token stream and use them somehow. Tokens are collections of characters that have some semantic value in the language you're using. Tokens are essentially the words of our language. The token "_aFunction" could well be a Java identifier, while the token "23.3e7" is quite likely a number.
Starting with a given command-line string, I first break it into tokens. Even for our simple language, this quickly becomes a difficult task to do by hand. You end up with a giant decision tree and an awful lot of if/else and switch statements. However, lexical analysis is a well-understood field and Java provides a class that you can use. The java.io.StringTokenizer class takes some kind of Reader as its input and provides functions for retrieving tokens and setting various parameters. Essentially, all the if/else and switch statements are there. Most of them are in the 230-line nextToken() function! It's not without its quirks, but since James Gosling is listed as the original author, I'll just assume I didn't fully understand it.
Running the code in StreamTokTest.java will give you a good idea of how useful the StreamTokenizer is right off the bat (the output is shown in Listing 1). It handles comments, quoted strings with nested quotes, and chars in single quotes without even setting a parameter. It's not a perfect Java Lexer out of the box, but it wasn't meant to be. The tokens returned by StreamTokenizer have three properties: type, sval, and nval. Type is an int that represents the predefined type found or the single nonwhitespace character that follows the last token. sval and nval are string and double representations, respectively, of the token and are only valid if the token type was TT_WORD or TT_NUMBER. Notice that the valid Java double 2.01e3 was broken into two tokens, the number 2.01 and the word e3.
There are three other things worth noting: a word and a quoted string have distinct types, the single quote is treated the same as the double quote, and the numeric value is not erased between tokens. However, the only thing that really matters to me is that Java numbers don't all parse correctly.
How do I tell the Tokenizer that 2.01e3 is a number or 2L for that matter? I can't. The Tokenizer just keeps adding characters to the token until it finds a character that's not in the current token type. It doesn't know anything about context, so we'll put off numeric identification until we get to the parser. To do that, I'll need the Tokenizer to treat numbers the same way it treats letters. Strangely, the Tokenizer won't let me unset the numeric attributes it has set. I have to clear everything with a call to resetSyntax() and then add everything back into the tokenizer, except I add numbers as word characters. The Tokenizer now returns only TT_WORDs, but that's exactly what I want. Now that I have a sequence of tokens, I can do my parsing.
The goal of my parser is to fill some fields in the CommandLine object. These fields are used by the Command Processor to find methods, invoke them, and handle the returned object. These fields hold the method name, the target object name (the object that declares the method), the variable name for the returned object, and a collection of arguments.
variable = object.method argument, argument, ...
The variable, object, and arguments are all optional. When the StreamTok-
enizer gives me the first token, I don't know whether I've been given a variable, object, or method. Before I can decide, I'll need to know something about the next token. There isn't any way to look ahead with the StreamTokenizer, so I'll substitute a caching mechanism.
In my parser, the first call to nextToken() must return a word. I can't assign this word to the sMethodName field because I might have a variable assignment. I need to know if the word is followed by "=", a word, or nothing. I have to cache the current token's string value and then call nextToken() again, examining the value. If I find an equals sign, the cached value belonged to a variable. If not, the first word was either "object.method" or "method" and I'll have to put back the token I just took so it will be processed correctly.
Arguments and Types
Immediately following the method is a list of zero or more arguments. Since casting is allowed, they can be quite complex. Arguments can look like 2d, (float)2.33, 3e3, (java.lang.String) "I call it a \"Laser Beam\"", or "This is a test".
For reasons that will be clear later, I need to turn an argument into a Class type and an object representation. When I ran the arguments through the Tokenizer, I got back a string of data, a string type, or both. In the case of the argument string "(float)12", I received "float" and "12". I'll pass these both to my Argument class and let the class handle all the conversions. If I pass a null type to the Argument class, it will try to match the data to a primitive type.
To do that, I match the data given to me from the Tokenizer against the patterns that define the different types of primitive literals in the Java language. For example, "true" and "false" are the only allowed Boolean literals in Java. If I'm asked to construct an argument with a null type and data = "true", I should be able to easily detect that this is a Boolean argument. To examine the data, I'll use regular expressions as provided in JDK 1.4.
There has been a fair amount of grumbling about the inclusion of the java.util.Regex package in J2SE 1.4. Some people claim that since Java Regex packages have been widely available for some time, Sun is just adding unnecessary code bloat to the JDK. Personally, I wouldn't be as likely to use them if they weren't so readily available to me. No matter how you feel, regular expressions are extraordinarily useful things with which every programmer should have more than a passing acquaintance.
There are three basic functions that you use to apply regexes to your strings: Find() is used to match a substring in a string, matches() is used to determine whether an entire string matches the regex, and split() splits a string wherever it finds a match (similar to StringTokenizer). Unless you're receiving your regex strings dynamically, you'll want to precompile your regexes and reuse them, as this will greatly speed up your code. Create a Pattern object by calling its static compile() method. Since it's static, you can declare Pattern fields this way.
public Pattern p = Pattern.compile("(true|false)");
My regex in this example is quite simple. It must find either "true" or "false". To apply it to a string, I must first create a Matcher. Then I call the matches() method, which only succeeds if it matches the entire string. So both "tru" and "truly" would fail.
Matcher m = p.matcher( stringData );
if ( m.matches() )
// it's a boolean literal.
I've created regular expressions for most of the primitive literals. I don't need them for string literals or char literals because those are wrapped in "" or ''. If no type has been found for the data, it's checked to see if it's a valid identifier. If so, it will be presumed that the argument passed is a variable that has previously been created in the CommandProcessor. Its type will be discovered just prior to finding the desired method.
To recap, we have now parsed a command line and retrieved a variable name if one was requested, a method name with an optional target object, and a list of arguments. The arguments have one or two pieces of string information. The type can be given by an explicit cast, can be implicit in a literal, or can be implicit from a stored variable. The data is just the string data typed in at the command prompt. The Argument class will have to create an object of the type specified by the argument and "set" it with the given data. It also creates a Class object of the type specified. Why and how Argument does this will be discussed later. First, a little discussion about reflection is in order.
Intro to Reflection
In Java, every instance of an object has an associated "java.lang.Class". Class objects contain the lists of methods, constructors, fields, etc., that belong to objects. These could have been declared in the .java file or inherited. When introspecting an object, you're essentially rummaging through the java.lang.Class information stored in the object. One of the methods built into the Command Processor is "dump". This method will take an object and introspect it and list the constructors, methods, and fields of the given object. It will even try to get the value of the fields returned but, as you'll see, there are few classes that are not fully encapsulated, so you'll rarely see the values in fields - unless you use the trick I show in the Util.toString
(Object o) method, as described a bit later.
Unsurprisingly, methods, constructors, and fields are all represented by objects in the java.lang.reflect package and stored as arrays of these objects in the Class object. To get an array of methods for your object, get your object's Class object and call the appropriate get method:
Method mymethods = myobj.getClass().getMethods()
The same formula works for the fields and constructors. In fact, as I mentioned earlier, I've created a generic toString method utilizing this feature. It's static and I always use it. I simply override my standard toString( ) method with this line:
return Util.toString( this );
My generic toString method takes the given object and introspects all its fields. Listing 2 shows my toString method being called on the Command Processor. Notice that the RegexMethodFilter also gets introspected. This is because it uses the new toString method. The method will output the field name and the value, if it can get it. Since the field is likely protected or private, toString( ) shouldn't be able to get the value, but that's where the AccessibleObject comes in. Methods, Fields, and Constructors all inherit from AccessibleObject. Simply call setAccessible(true) on the object. This is basically there for things like serialization, but it's worth noting that your private variables are only private if you provide a security manager with your application.
The one thing I should make clear here is the difference between getDeclaredXXXs( ) and getXXXs( ). getDeclaredXXXs gets all the XXXs declared in the class, regardless of access modifiers (public, private, or protected). getXXXs gets only public items as well as all inherited items. We'll go into more detail about making calls on these objects later.
Methods contain a plethora of information. There are 12 access modifiers (like public, private, etc.) found in java.lang.reflect.Modifier. Methods always have a name and can also be discriminated by parameter type(s), exception type(s), return type, and declaring class. Filtering on any combination of these properties could be a daunting task but, once again, regular expressions come to the rescue.
To get a list of things to filter, I make a call to one or more of the four "getMethod" members of java.lang.Class. As mentioned before, I have a choice of getting declared or inherited public methods. In addition, I can attempt to find a single method instead of an array of methods. For the Command Processor, this is the most convenient. Any given command line will make clear the name of the method, any arguments, and possibly even the object on which to execute the desired method. The get methods of Class are just the thing for finding a specific method:
public Method getMethod(String name, Class parameterTypes)
As noted, this searches only public members of the class. Notice that the second argument is an array of class types. This method will return only methods with the exact signature specified by the name and the parameter types. Calls to getMethod are exactly why we needed the Argument class to provide us with a Class object for its type. getMethod must have the exact argument types or it will fail to find a method.
If a method is returned, I still have to check to see if its use is allowed. It's not a good idea to call methods like wait() and run() from the Command Processor, so they should probably be filtered out. The MethodFilter interface abstracts this functionality. The Command Processor instantiates its internal method filter, called RegexMethodFilter, and all objects will use this filter unless another one is provided. The RegexMethodFilter class adds one essential method to the implementation of MethodFilter, addExpression(). This method adds the given expression to an internal list of regular expressions, each of which will be tested against the given method. If a match is found, the method is rejected. This time, we use the find() method of the matcher class because we want to match any substring in the method signature.
The Command Processor, for example, does not want to expose the main( ), run( ), or wait( ) methods, so the internal filter will need to exclude them. The patterns "main\( .* \)", "run\( .* \)", and "wait\( .* \)" will reject main(), run(), and wait(), but not maintain(), runtimeTarget(), or waitlist().
Running the Command Processor
Requirement 5 calls for internal commands to provide the most basic functionality, like exit and list. Since I know that the Command Processor will introspect an object looking for methods to call, I'm going to get slick and have the Command Processor introspect itself for the internal methods. This way, if I choose to add a new method to the list of internal commands, making it available is as simple as ensuring that it doesn't get filtered out.
Once the Command Processor has loaded the internal and external filters and objects, the run command enters a loop, grabbing a string of input from stdin, parsing it into a command and parameters, and finding and invoking the underlying method. As discussed earlier, the CommandLine class takes the input from the user and breaks it into a method name and a collection of arguments. It also provides access to the arguments as an Object and as a Class. As we'll see, this is not exactly trivial. It's important, however, because you invoke methods with an object array but find them with a class array.
There is quite a bit of chicanery involved with handling parsed parameters and it's all because of primitive types. The reason, as I mentioned earlier, is that the "getMethod" members of Class all expect a Class to describe the arguments of the method you want. The "invoke" member of Method requires an Object with correct types and data. Making classes and objects for primitive types is a bit tricky.
Let's examine the process for parsing "(float)21". We see that we have a parameter with the type "float" whose data is "21". I quote them here to remind you that they're still strings. float is a primitive type and there's no facility to turn a primitive into a Class object. In a perfect world, you'd be able to dynamically create your primitive the same way you do any other class: Class myClass = Class.forName("java.lang.StringBuffer";. Once you have a class, it's trivial to create an object if it has no argument constructor: Object myObj = myClass.newInstance();. Unfortunately, this is not allowed for primitives. There are static class objects available for the primitive types and they must be used here. They are members of the classes that wrap primitives. For int, there's the java.lang.Integer class; use its TYPE field as shown in Listing 3.
Now I'm faced with a nice long list of if/else statements. I actually used a static hashtable instead, which may be a bit faster and is much more flexible. At this point, I should have enough information in CommandLine to find the named method from a class. In the Command Processor "run" method, I try to get the method from the Command Processor, from the named object ( if specified), or from the "target" object defined when the Command Processor was constructed.
If the target wasn't named (i.e., myobj.myMethod), the processor goes first. This way, the processee can't accidentally override the exit command and get you stuck in a loop (experience teaches me yet another hard lesson). Once I have a method, it's a simple matter of invoking the method and providing feedback to the user.
"Ay, there's the rub" - Wm. Shakespeare
We've already had a quick look at Method's "invoke" method and the complete signature is shown in Listing 4. As you can see invoke takes two parameters. The first, an Object, is an object of the same type as the method's declaring object or an object of the same type as a subclass of the method's declaring object. This means you don't need the same object that gave you the method - any object of the same class or subclass will do. In our case, we're finding the method from the object we'll invoke it on, so we'll have no problem meeting that requirement. Some test code is provided with the full source that shows a simple walkthrough of what you can and can't do with invocation. More complex scenarios might have you invoking the same method on a variety of different objects that implement the same interface. Not having to find the method from each individual object would surely be convenient and faster.
The second parameter to the invoke method is an array of objects, one for each parameter and each representing an argument. The catch is you can't magically create an object for a primitive type. I really don't even have a primitive type. I have a string representing a primitive type and a Class object. As we've already noticed, I can't create an object of a primitive type via the Class object. I have to determine the type and create an instance of the appropriate wrapper class. For the int type this is java.lang.Integer. Therefore, it looks like I'll have another battery of if/else statements.
Object o = new Integer(sData);
Once again, rather than a long if/else block, I use a static hashtable. This time it's a bit more complicated, but I can take advantage of the fact that all primitive wrapper classes have string constructors (except char, which is handled as a special case). Notice that I dynamically search for the string constructor to invoke. The advantage of that strategy is that any object with a string constructor can be instantiated in the same step. For example, one of the API class's methods, test(java.lang.StringBuffer), works automatically with this setup.
As you can see, the method I found was located with a class object of type int (from the static Integer.TYPE), not with a Class object of type Integer. They are different. They have to be able to discriminate between foo(int I) and foo(Integer I). However, when I invoke the method, I use objects of the wrapper types. The JVM will handle the conversion for me, but it's important not to confuse the methods. In essence, you get around strong typing here so be sure you're calling the right method. Again, the full source provides sample code that shows argument conversion at work.
One final note, the invoke method declares three exceptions. Of course, when you dynamically call a method, you can't specify any exceptions thrown by the method, because you don't know what method you'll be calling until runtime. InvocationTarget-
Exception wraps the exceptions thrown inside the method. If you want to know which exceptions the method threw, you have to call the new method in Throwable, "getCause". This is part of the enhanced "Chained Exception Facility" in JDK 1.4, which is a standardization of chaining in the Throwable class.
Kicking My Own Tires
Listing 5 provides sample output from the Command Processor with the API class provided. List is an internal command that lists the internal commands available and those exposed by the internal target object. It also calls listvars, which will list any variables assigned. In the API class, test : java.lang.StringBuffer merely takes the input and calls reverse on it, printing them both. There are two Test methods (notice the capital "T"). One takes an int and the other takes an Integer. The int version adds three to the given number and the Integer version adds four. Both return ints. Notice that the second call, which uses the variable assigned from the first call, ends up calling the Integer version. This is because invoke always returns an object. There is no way to determine the primitive type of the returned value from invoke without checking the Method object. I haven't bothered to catch that yet, but it should be simple enough. The mischievous among you might want to try casting the variable directly into an int, like this: Test (int)myvar. That's going to throw a NumberFormatException because the constructor to int will not be able to turn "myvar" into a number.
Since I don't want the Argument class to know anything about the variables or even the Command Processor, I can't resolve this in the Argument class. The smart thing to do would be to turn the returned object into an Argument and store arguments in the variable map rather than objects. This is left as an exercise for the reader (I've always wanted to say that).
As you browse through the source of this project, you'll likely note that most of my classes have main functions with a variety of test scenarios in them. Most of the tests are commented out, with only the latest test left standing. This is because I'm a firm believer in adding a feature and then testing it right away. Perhaps, now that I have completed this tool, I'll find myself writing fewer main functions. What I'm really looking forward to is writing a whole lot less System.out.println( ) statements. Now, if I want to know the state of an object, I'll ask it directly.
This code is easily adaptable to an internal console similar to those found in most PC games these days. Anyone who has played a PC game in the last five years has likely seen the drop-down command prompts that are becoming ubiquitous. The console, while not new by any stretch of the imagination, provides a very useful tool for developers and power users. Normally, a console would allow only the getting and setting of parameters and the reloading of configuration files. However, even viewing and setting properties in the running system can be extraordinarily useful.
I would like to thank the following people for their contributions to this article: Brett Andrews and David Pidcock for the original concept and reflection ideas, respectively. Mark Eames, David Colon, and Jerome Liang for the technical review and numerous English corrections.
Richard Ross is an engineering manager for Raining Data, Inc. He has been an
engineer for 16 years, with experience ranging from custom ASICs to enterprise Java development.