HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML

A major roadblock to using any of the server-side scripting architectures for developing commercial software is the fact that (traditionally) the source code must be delivered to customers when deploying applications.

Java source code is compiled into an intermediate code called bytecode, and the Java Virtual Machine (JVM) interprets this bytecode directly. It's the bytecode that makes Java class files completely platform-independent. Not only is the bytecode easy to decompile, but the descriptive variable names are included in it (and thus in the decompiled source code), making it much easier to understand the decompiled source code. This presents another formidable roadblock to deploying commercial Java-based software.

This article outlines a technique to protect JSP-based applications in such a way that they can be deployed to customers without giving away source code or class files that are easy to decompile. This technique employs features of the Java 2 Platform, Enterprise Edition (J2EE) Web application specification and a bytecode protection technology called obfuscation. A detailed example is provided that enables you to better understand the issues and the solution.

JavaServer Pages (JSP) provide a rapid development and deployment analog to Active Server Pages (ASP) with a few significant advantages. Servlet source code is generated from the .jsp files in the form of .java files. These are then compiled into standard servlet .class files.

These servlet classes are loaded into a server (referred to as a container in Java nomenclature). The container routes JSP requests to the corresponding class. With ASP, the source code is actively interpreted at the server and the response is sent back to the client. With JSP, the Java bytecode is preloaded into the container, making responses to requests highly efficient.

Web Application Architecture
The Web application specification ( http://java.sun.com/products/servlet/2.2/index.html, section 9) allows JSP applications to run on any platform and in any vendor's J2EE-compliant container. It specifies a standard directory structure to hold static content (e.g., HTML pages and images), JSPs, servlets, and supporting Java classes. In addition, it defines a deployment descriptor - an XML file that conforms to a document type definition (DTD) found at http://java.sun.com/j2ee/dtds/web-app_2_2.dtd. The deployment descriptor defines metainformation about the application to the container. This can include global variables called context parameters, servlet definitions and their initialization parameters, and URL mappings.

Since generated classes from JSP files are servlets, we'll see that using servlet definitions and URL mappings enables us to deliver a JSP application without the source code found in the .jsp files.

The following is an example of a Web application directory structure. Required elements are in bold face:

MyApp/
index.html
processFunctions.jsp
images/
logo.gif
WEB-INF/
web.xml
classes/
HTMLUtils/
taglib/

Note: The taglib and classes directories are required only if the application uses tag libraries (see http://java.sun.com/j2ee/tutorial/1_3-fcs/doc/JSPTags.html for more information) or supporting classes, respectively.

JSP Life Cycle
Application containers that support JSPs go through the following general steps during development:

  1. Java source generation
  2. Java source compilation
  3. Resultant servlet class installation and content delivery
Changes made to a JSP file during development trigger an automatic sweep through the steps listed above, which helps in the ability to rapidly develop JSP. However, there's a certain amount of overhead, including a file-system check to determine if a file has changed for every request. Therefore, when a JSP (or Web application) is ready to be deployed, it's better to turn off the dynamic generation/recompilation features.

It's important to note that the code running in the container that's servicing requests is the compiled .class file(s), not the .jsp file. This is the first step in eliminating the need to distribute the .jsp files. The other steps involve configuration issues outlined in more detail in the example Web application discussed later. For more information on the life cycle of a JSP, go to http://java.sun.com/j2ee/tutorial/1_3-fcs/doc/JSPIntro4.html.

Obfuscation
Obfuscation tools have gone through at least two generations to date. The first generation analyzed a group of class files and replaced class, method, and variable names with meaningless identifiers. As the obfuscator parsed the files, it would internally keep track of mappings from the original identifiers to the new (meaningless) identifiers. This made the source code obtained from decompilation much more difficult to analyze and understand.

The code in Listing 1 is a small Java application that uses typically descriptive class and method names; the code in Listing 2 is an obfuscated version. (Listings 1-9 and additional source code can be downloaded from below.) It's much more difficult to glean functional information from the decompiled obfuscated code. An end user would run both versions of the application the same way (java VehicleApp). Note that the method BasicVehicle.showInfo() uses Java's reflection classes to avoid a very important shortcoming in first generation obfuscation tools: string literals are not obfuscated in any way. String literals often give away information about what is happening in the code.

Figures 1 and 2 show the screen output from the run of the regular and obfuscated versions, respectively.

Figure 1
Figure 1

Figure 2
Figure 2

Second-generation obfuscation tools often provide a facility to encrypt string literals and will also rearrange certain code blocks, such as loops, to be more confusing.

An interesting side effect of obfuscation is that the resultant class files are usually smaller due to the shortening of class, method, and field names. Some second-generation obfuscators even claim a performance improvement because of the way the code is rearranged.

Many obfuscation tools offer the option of using "illegal" identifiers. These identifiers don't conform to the bytecode specification in the Java Language Specification. Most (current) JVMs will still work with these identifiers while many decompilers won't. The perceived advantage in foiling decompilers is not worth the risk that the code you deliver won't work when a client updates his or her JVM at some point. This feature should always be disabled. It's important to note that obfuscation, even the kind that encrypts string literals, normally results in perfectly legal (if confusing to humans) bytecode.

Sample Web Application
Application Overview

The example Web application is simple; it has a number of features that highlight the JSP life cycle and show how to break the dependence on the original .jsp source code files and the resultant servlet .class files. These features, listed below for reference, are described in more detail as the example is analyzed:

  • Self-referential links
  • Included JSP files
  • Supporting class files
  • Context parameters (global to the Web application)
  • Error page support
Listing 3 provides all the source files that make up PopQuiz, the sample application. A user can select from a list of multiple-choice quizzes. The questions are displayed in a random order as is each answer. The application keeps track of the order of the questions and answers that a particular user receives, along with the user's answers and the correct answers. A review of the quiz, highlighting right and wrong (or unanswered) questions, and a score are given when the user submits the quiz for grading. The quizzes are organized within the Web application's directory tree as a set of XML documents. Listing 4 provides the DTD and Listing 5 the sample quizzes.

The Tomcat application server, a subproject of the Apache Jakarta project (http://jakarta.apache.org), was used to test the Web application. Note, however, that the Web application should run in any J2EE-compliant container as is. The Xerces XML library, part of the Apache XML project (http://xml.apache.org), was used for the XML parsing. The flow through the application is as follows:

  • index.jsp: Shows available quizzes
  • TakeQuiz.jsp: Displays quiz
  • GradeQuiz.jsp: Displays quiz results and score
If any of a number of error conditions occur (bad quiz file name, exceptions on XML parsing, etc.), the JSP error page Error.jsp is displayed. Each of the above mentioned files has an include reference to common/GlobalHeader-Vars.jsp. A number of global variables are defined in this file.

Self-referential links, as well as links to the other pages, illustrate an important feature in eliminating the need for the .jsp source files: URL references won't change throughout the application. This preserves an important feature, one that makes JSP technology attractive - the ability to rapidly develop and deploy.

The included common/GlobalHeaderVars.jsp file illustrates an aspect of the JSP servlet generation life cycle. Included files are integrated into the JSP and then a .java file is generated. Since the included file is never referenced directly (through links or form submissions), the original source doesn't need to be referenced in the deployment descriptor (WEB-INF/web.xml, see later).

Supporting class files contained in the WEB-INF/classes directory are automatically included in the container's classpath. No special referencing or classpath manipulation is required when delivering a Web application in general, or in the case of the special structure we're examining here.

Context parameters found in the deployment descriptor (WEB-INF/web .xml) are utilized in the same way whether the page is a JSP or a servlet. Error pages are a convenient mechanism for forwarding to a JSP in the event of an exception. Ordinarily, the container would be responsible for displaying information when an exception occurs. The content of this information varies from container to container and usually shows a stack trace, which is not very useful to the average user. The error-page mechanism allows for a customizable error page that can have the same look and feel as the rest of the application. In the sample application, the error page displays a single meaningful exception string along with links to get back into the application.

Listing 6 provides the original deployment descriptor; Listing 7 shows the modified deployment descriptor. Figures 3 and 4 show screenshots of the same quiz in a different order. The "Refresh Quiz" button can be clicked to show how the application automatically scrambles questions and answers.

Figure 3
Figure 3

Figure 4
Figure 4

Next Steps
We now need to generate and compile the .java files. The resultant class files will be referenced from a modified deployment descriptor. At that point, the original .jsp files will no longer be needed. If the class files were obfuscated as outlined above, it would provide a reasonably safe way to deliver a Web application commercially. Safe in this context refers to protecting the original source code from reverse-engineering. It should be noted that without obfuscation, the class files aren't much more protected than the original source files.

JSP to Java Source Generation
Ordinarily, the container is responsible for generating .java files from the .jsp source files, compiling the .java source files, loading the resultant class files into the container, and delivering the content to the client. By visiting each page in the application, we could rely on the container to generate the class files and then reference these class files in the deployment descriptor. This has two drawbacks: the generated class files typically have cryptic or long names (here's a Tomcat example:_0002fTakeQuiz_0002ejspTakeQuiz_jsp_0.java), and development and deployment become increasingly difficult to maintain this way.

Most containers provide command-line or GUI utilities for manually generating the .java files from the representative .jsp files. Usually this is simply a command-line interface to the same classes that generate the .java files on the fly. These utilities provide a more manageable way to generate the entire application at once. Convenient switches are also provided to allow the entire application to be generated in a particular Java package. Assuming the following directory structure:

PopQuiz/
Error.jsp
GradeQuiz.jsp
index.jsp
TakeQuiz.jsp
common/
GlobalHeaderVars.jsp
quizzes/
Geography.xml
PopQuiz.dtd
SimpleMath.xml
WEB-INF/
web.xml
classes/
com/
MPowerIT/
io/
XMLFileFilter.class
quiz/
Question.class
Quiz.class

The following Tomcat command (run from TOMCAT_HOME):

jspc -p com.MPowerIT.servlet -d webapps\
PopQuiz\WEB-INF\classes -webapp webapps\PopQuiz

will generate the .java files in the com.MPowerIT package and place the resulting .java files within the Web application's classes directory. Remember that anything under WEB-INF/classes is automatically included in the classpath. Now, we can compile the .java files (and obfuscate the class files), remove the .jsps, and edit the web.xml deployment descriptor file to properly reference the class files. Note that the file GlobalHeaderVars.java will also be generated but isn't needed as it's incorporated into the other files. It can be safely removed. Here's the resultant directory structure (with the .jsp files removed):

PopQuiz/
quizzes/
Geography.xml
PopQuiz.dtd
SimpleMath.xml
WEB-INF/
web.xml
classes/
com/
MPowerIT/
io/
XMLFileFilter.class
quiz/
Question.class
Quiz.class
servlet/
Error.class
GradeQuiz.class
TakeQuiz.class
index.class

The Deployment Descriptor
The final and crucial step is editing the web.xml deployment descriptor. Listing 7 contains the complete contents of the file. The <servlet> and <servlet-mapping> sections allow us to control how the application is accessed:

... <servlet>
<servlet-name>index</servlet-name>
<servlet-class>com.MPowerIT.servlet.index</servlet-class>
</servlet>
...
<servlet-mapping>
<servlet-name>index</servlet-name>
<url-pattern>index.jsp</url-pattern>
</servlet-mapping>
...

The <servlet> tag associates a name with the servlet and references a class. The <servlet-name> tag has nothing to do with how the servlet is accessed and is strictly for use within the deployment descriptor. The <servlet-mapping> tag defines how the servlet will be accessed. The <url-pattern> tag is the key to keeping all our self-referential and internal links consistent. It's this tag that allows a URL like www.MPowerIT.com/Pop-Quiz/index.jsp to be valid even though there's no longer an index.jsp file.

Testing the Application
The original application is contained in the file PopQuiz.zip, which includes the .jsp source files. Another version of the application is available in the file CommercialPopQuiz.zip, which includes the .java source of the generated JSP. The obfuscation of the class files is left as an exercise for the reader. To be as effective as possible, however, remember to use an obfuscator that not only rewrites all class, method, and field names, but also encrypts string literals. See Listing 8 for an example of a decompiled GradeQuiz.class file. Listing 9 shows the decompilation of this same file after it's been obfuscated. The obfuscator KlassMaster (www.zelix.com) was used to obfuscate the class file; it encrypts string literals.

The application uses the Xerces DOM XML parser found in the Apache Foundation's XML project, http://xml.apache.org. The xerces.jar file needs to be in the container's classpath. This Java archive (JAR) comes with Tomcat in its <TOMCAT_HOME> /lib directory. All .jar files in the lib directory are automatically included in Tomcat's classpath, so no additional configuration is required when using Tomcat.

Assuming that your classpath is set properly as described above, you should be able to expand the .zip archives in the appropriate place for your container and run the sample application. For Tomcat, you would expand the archives in <TOMCAT_HOME> /web-apps. The URLs to access the Web applications are http://<tomcathost>:<tomcatport>/PopQuiz/index.jsp and http://<tomcat-host>:<tomcatport>/CommercialPop-Quiz/index.jsp. The output you get in your browser should be exactly the same in either case.

Summary
JSP technology provides for rapid development and deployment as well as efficient delivery of content. Ordinarily, the cost of this is inclusion of the source code, in the form of .jsp files, with the application. By using the deployment descriptor's features, obfuscation, and the fact that compiled JSPs are servlets, developers can take advantage of the benefits of JSP without the commercial downside.

A number of advanced JSP technologies such as the use of beans and custom tags were not covered in this article. The outlined techniques for protecting a JSP-based application apply to these other facets of the technology as well.

Resources

  • Servlet Specification: http://java.sun.com/products/servlet/2.2/index.html
  • web.xml DTD: http://java.sun.com/j2ee/dtds/webapp_2_2.dtd
  • Custom Tags in JSP: http://java.sun.com/j2ee/tutorial/1_3-fcs/doc/JSPTags.html
  • JSP Life Cycle: http://java.sun.com/j2ee/tutorial/1_3-fcs/doc/JSPIntro4.html
  • Apache Jakarta Project: http://jakarta.apache.org
  • KlassMaster Obfuscator: www.zelix.com
  • Apache XML Project: http://xml.apache.org

    Author Bio
    Micah Silverman has been working in software development and computer security since the 1980s. He's been developing Java applications since the language was released in 1995. Micah is a Sun Certified Java Programmer and an ISC2 CISSP (Certified Information Systems Security Professional). [email protected]

    Download Source Files (~ 54.9 KB ~Zip File Format)

    All Rights Reserved
    Copyright ©  2004 SYS-CON Media, Inc.
      E-mail: [email protected]

    Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.