HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML
 

We've all heard about the great benefits of distributed computing, especially in the areas of scalability and performance. With Java, implementing a distributed solution has never been easier or more practical. We're given three distributed object options that work quite naturally with Java: namely, Java RMI, CORBA and EJB. The issue that many Java developers face when pondering distributed architectures is whether or not a distributed solution is appropriate for the problem at hand. Often the problem being attacked is one that has been traditionally solved in a nondistributed fashion. One approach for finding distributed solution potential is to look for basic tasks in your software that can be broken out. Once you identify these tasks, all you need is a framework for distributing execution.

This article offers an approach for building distributed solutions. The fundamental idea is to separate the problem into two pieces, the critical task and the noncritical tasks. The critical task makes decisions about work to be done. In turn, this work is done by noncritical tasks. It's this separation of responsibility that creates increased scalability and performance.

Distributed Architecture
A distributed solution usually includes both a distributed software design and a distributed network design. An ideal software solution should be, in part, transparently scalable with the addition of new processors to the network. A three-tier approach can facilitate this scalability with the critical task on one tier and the noncritical task "handlers" on another (see Figure 1).

Figure 1
Figure 1:

The noncritical tasks are simple and atomic. The handlers simply take a noncritical task and execute it. On the middle tier lives a centralized task scheduler that simply accepts new tasks and makes them available to the handlers. Adjusting the number of task handlers independently with regard to the other tiers can increase performance and scalability.

Task Framework
The defining relationship that constitutes the Task Framework is the division of responsibility between the Task class and the Handler class. The Task class is an abstract class whose subclasses contain the information necessary for remote execution. The Handler class is an abstract class whose subclasses are customized to handle different types of Task objects. It's the Task object that's accessed or transported across the network, depending on which distributed object mechanism you use. In order to keep network traffic at a minimum, Task objects must be as lightweight as possible. They should contain only data attributes and accessor methods. The Handler class is responsible for accessing the Task objects and performing the appropriate execution. That means that it's the Handler class that maintains connectivity to external resources such as databases, mail servers and network printers In other words, the Task class is responsible for the information and the Handler class is responsible for the action. Because the Task objects are atomic and simple, the Handler objects are equally basic. This means that you can have any number of tasks and handlers instantiated. Only resources and practicality limit you.

Java RMI, CORBA and EJB each have a lightweight mechanism that you can take advantage of when passing around objects or their interfaces. If you're like me, this makes a lot more sense when you see some code. For Java RMI, the Task class simply needs to implement the Serializable interface from the java.io packages, as shown in Listing 1. Listing 2 shows the CORBA IDL for implementing the Task information as a structure. Using EJB, the Task class would be an entity bean as shown in Listing 3.

The remaining examples will use Java RMI since it's the simplest way to illustrate the approach. Please note that CORBA tends to be a bit faster than Java RMI and an EJB solution may be less resource-intensive by taking advantage of instance pooling. Of course, the underlying remote method invocation protocol that your EJB server uses will affect performance as well.

The Handler extends the Thread class and runs in its own process space (see Listing 4). It periodically polls for tasks that are waiting to be executed. In order to increase performance and reduce network traffic, pending tasks are transported in chunks. A chunk is simply an unordered group of tasks. Each Task object is passed to the handle() method for execution.

Task Flavors
While it's important that a task is atomic for performance and simplicity, its overall function need not be atomic. Allowing for different types of task behaviors will give you greater flexibility by allowing tasks to function collectively. Three useful strategies that come to mind are execution scheduling, task dependency and conditional task creation.

In many cases, you may want to assign your task a future execution time. Obvious examples include backing up information, clearing out log files and even rebooting a server. There are two categories of scheduled tasks: tasks that execute once at a specific time and ones that execute more than once on a regular interval. You may be thinking, "Why not use cron?" The answer is that you could, but you'd have to create separate scripts for cron to execute. Additionally, you have the flexibility of allowing your software to adjust the scheduled tasks based on internal state.

Often the appropriate time for execution is dependent on the completion of another task. Suppose you have an e-commerce Web site. You may have a task responsible for charging a credit card and another that sends order confirmation e-mail to the customer. The e-mail task can simply wait for the credit card task to complete before it becomes available for execution.

A task handler may contain conditional logic that triggers the creation of new tasks. Examples of this tend to be more esoteric and often involve workflow issues. Perhaps you're updating a customer database where, if the number of customers reaches a certain threshold, a new set of tasks responsible for updating statistical data elsewhere in the database is triggered.

Task Scheduling
The Scheduler facilitates communication between the critical task and the task handlers. Think of the Scheduler as a synchronized queue. Any remote process can place a task object in the queue. Execution handlers in turn pick up task objects, perform the appropriate execution and report the result of the task execution. In some cases, the task handler may create new tasks and execute them immediately or place them in the Scheduler. The Scheduler class is responsible for tracking the various stages of execution for all tasks as well as controlling the scheduling needs required by the various task flavors. Tasks exist within the Scheduler in three states. The pending state includes tasks that are waiting to be picked up by a handler. Once a handler has grabbed a task, it moves into the processing state. After that, the tasks move into the completed state. In other words, the Scheduler is responsible for making tasks available at the appropriate time, whether it's the scheduled time or, in the case of a dependent task, when another task has been completed. Listing 5 shows the interface for the Scheduler. Since multiple handlers may be requesting pending tasks at any given time, care must be taken to synchronize access to the tasks.

Example Application
Maintaining a relational database warehouse is a good application of this approach. Let's say you're designing a process that continually scans a database for current information, crunches some numbers and then updates the database with the results. In this situation, high performance equates to a more accurate and up-to-date data warehouse. I have found that in most cases scanning the database is cheap and updating the database is expensive. Delegating the responsibility of performing the updates will yield a significant increase in overall performance. Relational databases are designed to handle multiple concurrent users. You can take advantage of this database feature by distributing the tasks in order to increase performance. Think of the critical task and the task handlers as database users. The critical task spends its time scanning the database for necessary changes. It creates Task objects that represent database updates and schedules them for immediate execution. The task handlers pick up the scheduled tasks, generate the required SQL and execute the SQL to update the database. At first it may seem counterintuitive to add more users to a database, but you'll find that this design actually increases the overall performance. This solution scales easily by adding more handlers on more processors. When you deploy this solution, you'll need to do a little experimentation to find out what the best layout is for your particular network. Listing 6 illustrates a Handler class designed to update the warehouse.

Extending this solution beyond updating the database is relatively easy. Simply introduce new Task subclasses and new Handler subclasses into the mix and make the new Task objects available to the Scheduler.

Conclusion
Distributed tasking can be applied to solve a wide variety of problems. As in the database-warehousing example, using this technique increases overall performance by executing database queries in parallel while freeing up the critical task for faster processing. You can also take advantage of the various task flavors to utilize distributed tasking to control workflow management by having tasks trigger other tasks. Nearly any noncritical task can be delegated to a remote handler. Having a distributed framework in place as a part of your design will encourage future software enhancements to follow a distributed model.

There are a few limitations to discuss. Throughout this article, the Task instances are stored in memory. This design is fine for small-scale systems, but it can prove limiting for large-scale systems where tasks may be queued rapidly and in large quantities. The best solution in this case is to have the Scheduler physically store the Task objects until needed. If you decide to go the EJB route, your EJB server will do this automatically for Entity beans either through serialization or through a relational database mapping. Another area that needs to be addressed is exception handling. Exceptions are likely to occur on the handler tier. Depending on the type of exception, you may want to have your handler retry the execution or pass the exception back to the Scheduler and ultimately the critical task.

It's never been easier to design and implement distributed solutions. Not only are distributed architectures interesting and fun to implement, they're now also accessible to practically every Java developer. Java RMI is free and EJB servers threaten to be omnipresent in the near future. The challenge facing the Java developer is finding new approaches that exploit the features of distributed computing.

Author Bio
Sam McKenna is a software developer and consultant based in Denver, Colorado. He has over 10 years' programming experience using C++, Forté and Java. [email protected]

	

Listing 1:

public abstract class Task implements java.io.Serializable
{
...
}


Listing 2:

struct Task
{
// Add information specific to the task
};


Listing 3:

public abstract class Task implements javax.ejb.EntityBean
{
...
}


Listing 4:

import java.rmi.*;
import java.rmi.server.*;
public abstract class Handler extends Thread
{

protected abstract void handle(Task task) throws Exception;
   public void run()
   {
      try
      {
         Scheduler scheduler = (Scheduler)  Naming.lookup("rmi://localhost/scheduler");                 
         while (true)
         {
            Task[] tasks = scheduler.getTasks(10);

            for (int i=0 ; i<tasks.length ; i++)
            {
               try
               {
                  handle(tasks[i]);
               }
               catch (Exception handleException)
               {
                  handleException.printStackTrace();
               }
            }
            // Sleep for five seconds
            try { sleep(5000); } catch (Exception sleepException) { }
         } 
      } 
      catch (Exception e)
      {
            e.printStackTrace();
      } 
   }  
}


Listing 5:

import java.rmi.*;
public interface Scheduler extends Remote
{
   // Add a Task to the queue
   public void addTasks(Task[] task) throws RemoteException;

   // Get tasks
   public Task[] getTasks(int max) throws RemoteException;
   
   // Indicate completion
   public void complete(Task[] task) throws RemoteException;
}


Listing 6:

import java.rmi.*;
import java.rmi.server.*;
import java.sql.*;
public class UpdateHandler extends Handler
{
   private Connection connection_ = null;       
   public UpdateHandler() throws Exception
   {
      Class.forName(driver);
      connection_ = DriverManager.getConnection(url, user, password);
   }
   protected void handle(Task task) throws Exception
   {
      if (task instanceof UpdateTask)
      { 
         Exception exception = null;
         UpdateTask updateTask = (UpdateTask) task;
         Statement stmt = -znull;
         try
         {      
            stmt = connection_.createStatement();
                                
            String sql = updateTask.getSQL();
                                                                
            stmt.executeUpdate(sql);
         }
         catch (Exception e)
         {
            exception = e;
         }
         finally
         {
            try
            {
               if (stmt != null)
                  stmt.close();
            }
            catch (Exception fe)
            {
            }
         }
      }
   }
}



 

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.