But an equally compelling story can be told about the virtues of building the server-side of a web application in Java as well. This article will outline the major design issues of creating a server-side Java solution, and though I promise not to make this an infomercial, a number of examples will be drawn from one particular implementation, a commercial product called Dynamo which I played a key role in creating.
Why Server-Side Java?
One of the amazing things about the web is that all of the excitement (and money) centers around the very thing that a web site was designed not to do, namely, act as the server side of a client/server application. From web-malls to search engines to postcard shops, the demand for real functionality on the server has been skyrocketing, yet the programming model for creating these client/server applications is appallingly impoverished. It's a testament to the tenaciousness of web site programmers that they can stitch very sophisticated apps together from a series of independent CGI scripts.
A bundle of perl scripts is fine if you're building a site to impress your friends, but when a Fortune 500 company waves a million or two dollars in your face to "web enable" their business systems, you probably need a somewhat more serious environment. I won't preach to the converted about why Java is so great as a general language, but I'll concede that it's not necessarily the obvious choice for server-side programming where the issues are a bit different from the desktop.
As it turns out, Java's advantages on the server are essentially the same as on the client, though in somewhat different proportions. Portability, the cornerstone of Java, isn't as much of a concern as the size and expense of the system growth. By the time you are integrating ten web servers, a 40,000 product inventory, and a back end purchase and fulfillment system, you're lucky if you're allowed to swap manufacturers for a network card, much less operating systems! Nonetheless, MIS managers will always choose a highly portable solution over a lesser one, and Java clearly wins that contest every time.
For smaller systems, portability can be totally key. If your dating service application can run without modification on UNIX and Windows-NT, you've doubled your market without changing a line of code. In fact, we develop all of our cross-platform as a matter of course. In Java, you have to work pretty hard to make your applications not portable.
More compelling reasons to use Java on the server come from Java's memory management, type-safety, exception handling and multi-threading. Since server applications run for days or even months without interruption, memory leaks, no matter how minor, are a very serious problem. In Java, the whole issue simply disappears. Similarly, Java's type-safety and run-time bounds checking eliminates those particularly annoying memory corruption errors.
The remaining scourge of server applications would be intermittent bugs, especially those that are only encountered every few days or weeks. Java's exception handling is especially useful, since it can catch and log the exception with a stack trace. The assurance that you can regain control no matter what went wrong is a great comfort.
Performance-wise, Java obviously has a lot of ground to cover, but the Just-In-Time compilers will bring Java to within striking distance of C++. It's important to note that most large C++ implementations end up doing a lot of ad-hoc memory management, generally a lot less efficiently than what the wizards at JavaSoft, Symantec, and Borland will be devising. On the positive side, Java is clean multi-threading model makes it ideal for highly concurrent systems, the very essence of a server-side application. Of course thread-bugs can be even more devious than memory bugs, but Java's certainly no worse than any other multi-threaded platform.
OK, Java Rules. Now What?
Unfortunately, actually developing a server side application in Java isn't all that easy. Because of Java's considerable startup costs, using CGI to interface to the web server is very inefficient, and your Java program is started completely from scratch for every request. A much better solution is to create a single Java runtime which stays around the entire time which services each web page request in a new thread. This takes advantage of multi-threaded efficiency and allows you to maintain live objects and share data between clients. A common example of this is to create an single object which nails up a connection to an external SQL database which can be used by different page requests without having to reopen the connection each time. Another example might be a chess game which lets different clients play against each other by sharing a single chessboard object.
A number of web server companies have already announced that they will integrate a Java VM into their server, so the basic connection issue will presumably go away. We chose to solve this problem in Dynamo by running Java as a separate process, connecting to the web server with a localhost socket connection. The java.net package makes it extremely easy to accept multiple simultaneous requests and spin a separate thread to handle each one. This requires a small connection module to be written in C to handle the web server side of the connection, and we currently have modules for CGI, NSAPI and ISAPI. In addition to web-server independence, the co-server model allows the programmer to have full control over the process, runtime characteristics, and Java implementations.
From the earliest days of the NCSA HTTP server, those clever folks hit upon the idea of embedding calls to dynamic functions (in those days, UNIX shell scripts), inside the HTML files. The nice thing about this approach is that it provides a clean interface between the programmers' efforts and those of the design staff. Even in the common case where these two groups are actually the same person, the value of conceptual abstraction can't be overstated. On large projects, this kind of separation is crucial. You shouldn't need to find a programmer to change the spelling of a word or add a picture or advertisement to a page.
In Dynamo's implementation of embedded Java, we followed the ever important principal of "Keep it simple, stupid", and created a syntactic preprocessor which converts all of the HTML in a page into a Java class. The embedded Java statements, designated by special tags masquerading as comment delineators, are simply passed through verbatim. (See below)
Since Sun conveniently provides the Java compiler as part of the standard class distribution, Dynamo can perform all of the above actions automatically whenever the page has been changed. This also requires creating a special class loader. Another interesting topic!
Because all Java code must be contained in a class, the code is actually inserted into a method of a class which is created uniquely for the page. One very important feature of embedding is that the programmer be allowed to specify which Java class that their page class will extend. This allows several pages to share methods and inherit functionality, a crucial Java programming technique.
So far, a system as described above should be implementable in a day or two by a dedicated Java programmer, and you'd have a system that would be far superior to CGI and, or course, programmable in Java. But you'd still be stuck with one of the most annoying aspects of the web, namely that there is no inherent mechanism for identifying a set of page requests as belonging to the same user session. There are a variety of reasons why you would want to do this, but they all boil down to the basic need to maintain some kind of session state. For example, if you ask a user to type in his or her name on the first page, you should to be able to retrieve that name twenty pages later. As the web stands today, this simple task requires a great deal of contortions.
The proposed HTTP Cookie standard, adopted by both Netscape and Microsoft in their browsers, addresses a fundamental aspect of this problem by providing a general mechanism for the browser to include a small piece of data, the cookie, with every page request. A general session tracking scheme can be built on top of this mechanism by using the cookie as a session key which indexes session storage. Thus the web application can assign a unique key to each new session so that each page request will bear the key in its header. To store and retrieve a piece of data, the application can hash off the session key and the name of the data.
The only remaining issue is expiring the sessions and reclaiming the memory. From the server's perspective, a web application client never "quits"; instead, the user just stops requesting further pages. Thus the application must somehow reap session storage space once that session key hasn't shown up for some of period time, usually between several minutes to half an hour.
Dynamo's session tracking is actually far more complicated, as we wanted to be able to handle more sessions than could necessarily fit in RAM. Instead, Dynamo uses an integrated object-oriented database (written all in Java, of course), and it uses the database's indexing features to implement expiration. The design of the database is a pretty interesting topic, but clearly beyond the scope of this article!
Handling non-Cookie browsers is also possible, but not a pretty sight. The only recourse is to embed the session key in every URL, stripping it on the way in and attaching it on the way out. You may notice this happening on some sites which attach a lot of gobbledygook to each URL. We implemented this in Dynamo by building a Java class which understands standard lex parsing tables, and parsing the HTML for URL's.
The last and in many ways most important component of a server-side Java solution is the integration of the application with the widest variety of external systems. Most business information systems can be described as "fascinatingly organic" if you're in a good mood and "completely disastrous" if not. These systems are generally the conglomeration of dozens of generations of hardware and software strategies, under constantly shifting business initiatives. They are core to the business process, so downtime is often measured in thousands of dollars per minute. If that's not bad enough, the big clients always want their system on the web yesterday.
Fortunately, Java's integration capabilities are very strong, even in today's market where a number of the big modules such as JDBC are not yet shipping. There are basically five main integration strategies you can pursue with a Java server side application:
I list the databases first, because they tend to be the smoothest path for integrating to legacy systems, as a lot of" "middleware" solutions are implemented by creating a SQL database that is accessed through inserts, queries and stored procedures by several other systems. Java's forthcoming JDBC standard should make this path very easy, but there are already a number of existing solutions which talk to specific databases or to any major SQL database through the ODBC standard. For Dynamo, we chose to implement an ODBC gateway (using native methods) with a clean Java API.
- SQL database (direct or ODBC/JDBC)
- Object Oriented database
- File based integration
- TCP sockets
- Native methods
For object-oriented databases, the picture is a little less clear, particularly since OODB's can differ widely in programming paradigm as compared to SQL. Most of the major OODB companies have announced Java interfaces, and the Object Database Management Group is promoting a Java standard, but there is currently no high-power solution available today. Aside from standardization, one of the big issues will be how easily Java will be able to access existing middleware or legacy databases. A related problem is how exposed the persistence mechanism will be. Ideally, persistence has very little programmer impact, but in a lot of cases, programmers need access to the underlying schema, for example, if they are trying to map Java objects onto objects used by C++.
It's a bit surprising how many systems still rely on file-based communication, dropping files and picking them up from a shared file system. Java, of course, has file I/O facilities, but it currently lacks locking mechanisms, since these tend to be very OS specific. Java's network library, on the other hand, is extremely well designed and easy to use. For Dynamo, we implemented a simple e-mail "gear" using Java's socket class talking Simple Mail Transfer Protocol. Java also provides a higher level HTTP connection scheme with their URL and URLConnection classes. These facilities allow easy integration into any TCP based system, but TCP is still not very prevalent in the business information world.
Finally, Java's native method facility is there to pick up the remainder, whether you need to directly link in legacy code or talk some other protocol to another system. It's important to note that the native methods have to be loadable as a shared library (DLL on Windows NT), and not all the code you may have can readily be converted. Writing native methods for Java is surprisingly easy, and depending on your development environment, debugging can be a breeze. In Microsoft's Visual C++, for example, you can debug your native methods in source code and automatically jump to the offending line in the debugger when your code crashes.
And What Else?
The other day I was amused to see a job ad requesting a "Webmaster with five+ years of experience". Clearly there's not much track record for server-side Java out there yet, but with as much experience in this business as space-time allow, I'm completely convinced that Java is the way to go. There are a lot of design considerations that go into building any solid application framework, and hopefully this article has touched on the principal features and pitfalls of building and using one based on Java.
About the Author
Joseph Chung is co-founder and Chief Technology Officer of Art Technology Group and has led ATG's development efforts across a broad range of information design projects for a roster of clients that include Apple Computer, Chiat/Day Advertising, Gemini Consulting, Harvard Business School, International Online, MCI, NTT Data, and Stream International. Most recently, ATG has begun to ship its Dynamo line of server software products which enable enterprise-level Internet Applications. Chung holds Bachelors and Masters degrees in Computer Science from MIT.