Developing distributed applications, in contrast to developing traditional single-process applications, requires a completely different level of monitoring and diagnostic support. In this article I'll discuss how to monitor and diagnose distributed applications based on the CORBA standard.
Perils of Distributed Applications
As organizations make key aspects of their businesses ready and enabled for e-business, multitiered distributed applications are becoming increasingly ubiquitous. Diverse by nature, e-business systems require middleware to integrate middle-tier components into a cohesive computing environment. And as object-oriented programming has entered mainstream application development, CORBA has emerged as the standard middleware solution for integrating distributed e-business components that are implemented in disparate languages Java, C++ and others.
Distributed applications require a completely different level of monitoring and diagnostic support than traditional single-process applications. While the factors causing unexpected behavior and failures in a single process might be simple and easy to anticipate, a distributed system can suffer from any one or more of a whole range of bugs. Let's briefly review seven of the most common problems:
- Performance bottlenecks can appear in a distributed application when a complex operation is performed at a time-critical point, and can substantially slow down your application's overall performance.
- Network resource limitations can cause a distributed system to fail when the size of the system is ramped up. Scalability problems may not occur within your test configurations, but can appear later during deployment in the form of limited connections or insufficient bandwidth.
Over and above these various problems, in a distributed system diagnosis is more complicated than debugging a conventional single-process application. To set up and step through a test case can be very time-consuming when using code debuggers for the distributed modules of a system. And correlating message entry and exit points among numerous processes, each with its own code debugger, can quickly become impractical. Since you're not testing the application in real execution time, time-critical failures such as bottlenecks, race conditions, deadlocks and timeouts can't be detected. Conventional debugging rarely detects scalability problems either.
- Network failures can often partially afflict a complex network. As the application developer, it behooves you to detect and circumvent each point of failure.
- Race conditions can occur if parallel working modules of a distributed application aren't properly synchronized to prevent different modules from producing contradictory results. Synchronization errors are difficult to detect because they tend to be sporadic and aren't easily reproduced.
- Deadlocks can appear when the synchronization protocol between modules prevents each from completing its task. Like a race condition, a deadlock often appears only in a special situation and can be difficult to locate.
- Design errors in control flow can occur very easily. The control flow in a distributed application is usually much more complex than in a single-process application, leading to a wide variety of design errors. Unlikely events such as exceptions and failures within multiple modules can be especially difficult to handle.
- Timeout failures can occur owing to delays and bottlenecks in the network that cause distributed parts of an application to time out and produce a failure. Such a failure may propagate through the rest of the application if you don't handle it properly.
Monitoring Messages Between CORBA Objects
One good and effective way of diagnosing distributed systems is through monitoring communication between the various distributed components. The objective of this article is to demystify the CORBA communication bus by showing you how to capture the details of messages passed between CORBA objects. Such monitoring lets you observe and record method invocations and exceptions selectively, helping you avoid or eliminate bottlenecks, race conditions and other potential failures that might otherwise impede the performance of your application.
Let's look at what goals you should bear in mind as you monitor these messages.
- Distributed debugging: You need to monitor them during the development and test phases of a project. That way, you'll uncover problems before an application is deployed. When the application goes live, communication details should be logged to enable performance analysis and to make it possible to troubleshoot unexpected failures quickly. You should be able to activate monitoring on the fly without having to stop and restart the application.
- Application-level communication details: You should observe request-reply details as they occur at the application level. For example, "The buy method of the stock_exchange object was called using a stock symbol of SEGU and a share amount of 1000." Monitoring at the application level requires an understanding of all CORBA data types and of complex user-defined types. Details captured about each message should include request ID, interface name of the target object, method being invoked, parameter values, timing data, process IDs, host IDs and any thrown exceptions.
- Dynamic activation: Make sure that application objects are completely unaware of any active monitoring. You should be able to dynamically turn monitoring on or off and specify which communication details to observe while your application is running.
- Filter criteria: Ensure that it's easy to filter traffic and thus monitor only those interfaces, methods and parameters that you're interested in. Make sure too that it's possible to stipulate how many times a particular method will be observed and at which communication entry and exit points.
- Timing analysis: Use message timestamps and timing data to help identify server latency, message travel time and client wait time information that's extremely helpful for diagnosing and resolving timing-related problems.
- Data recording: Record monitored communication to enable logging message activity or analyzing results. It should be possible to parse and sort the recorded data.
How CORBA Communication Works
CORBA can be conceptualized as a communication bus for distributed objects. In a CORBA system the "client/ server" terminology applies within the context of a specific request. In other words, if object A invokes a method on object B, A is the client and B is the server; if B then calls A, the roles are reversed.
The Object Request Broker (ORB) is the mediator, responsible for brokering interactions between objects. Its job is to provide object location and access transparency by enabling client invocations of methods on server objects (see Figure 1). If the server interface is known at build time, a client can connect or bind to a server object statically. If unknown, it can use dynamic binding to ascertain a server's interface and construct a call to the corresponding object.
Exported server interfaces are specified in the CORBA standard interface definition language (IDL). You don't write server implementations in IDL: an interface description is mapped instead, using an IDL compiler, to native language bindings such as Java or C++. This allows each programmer to write source code independently in whichever language may be the most appropriate. A Java program, for example, can access a server object implemented in C++ the Java programmer merely invokes methods on the server as though they're local Java method calls. Figures 2 and 3 illustrate, respectively, an IDL description for a CORBA server and a Java client that calls a corresponding object implementation.
In Figure 2 Account is an interface that corresponds to a class implemented in a server. IDL attributes define the properties of a class (e.g., balance). The IDL compiler maps attributes to "get" and possibly "set" methods. Operations define the methods to be implemented by the server (e.g., make_deposit and make_withdrawal). Their parameters must be explicitly identified in the interface description as in, out or inout. Many other features are supported by IDL, such as inheritance for specifying derived interfaces, modules for establishing naming scopes and exceptions that are supported by an interface or raised by operations.
The IDL compiler generates a skeleton that's linked to the server program and provides static interfaces to call methods of an object implementation. The skeleton unmarshals methods and parameters that come from a client via the ORB. The IDL compiler also generates a client stub that's linked to programs that will statically invoke server methods through the associated interface. The client stub maps a CORBA server object to a native object in the client's language (see Figure 3). The stub acts as a proxy for remote server objects by marshaling methods and parameters to be transmitted via the ORB. CORBA also supplies the dynamic invocation interface (DII) for client programs to discover server interfaces and construct method calls at runtime. The DII requires the use of the CORBA interface repository, which contains compiled IDL descriptions that can be interrogated programmatically (see Figure 4).
The CORBA standard guarantees interoperability between applications built using different vendors' ORBs. The Internet InterORB Protocol (IIOP) defines standard message formats, a common data representation for mapping IDL data types to flat messages and a format for an interoperable object reference (IOR) over TCP/IP networks. In other words, IIOP is the CORBA wire-level protocol.
CORBA communication typically consists of a request message and a reply message. Most ORBs implement interceptors that permit these IIOP messages to be traced at the four points shown in Figure 5: SendRequest, ReceiveRequest, SendReply and ReceiveReply.
An Architecture for Monitoring Communication
Now that we've discussed the goals of monitoring messages between distributed components and reviewed how CORBA communication works, let's look at an architecture for monitoring in a CORBA environment.
Intercepting and interpreting IIOP messages can be achieved using four types of architectural components: Probe, Profile, Collector and Observer. Each monitored CORBA process whether acting as a client, server or both contains a Probe object that captures messages based on the filter criteria as specified by an active Profile. A Profile can be created, updated or uploaded to a Probe at any time. The intercepted data is recorded by the Probe, read by a Collector and transmitted to an Observer. The Observer is the primary collection point for aggregating data from multiple Collectors, and the data it records can then be viewed and analyzed (see Figure 6).
Given the absence of standardization in the area of CORBA monitoring and diagnostics, the design and implementation of a monitoring architecture will vary depending upon who's creating it the application developer, the ORB vendor or (preferably) an independent tool vendor. The issue of standardization will be discussed later. First, let's explore each of our architectural components in detail.
Most ORBs provide interceptors that allow the creation of a Probe object, which captures and records IIOP messages, within each monitored CORBA process. Only one Probe object is necessary per process, regardless of the number of business objects created. The business objects are completely unaware of the Probe, which means the application's business logic doesn't take the Probe into acount, except for the code that creates an instance of the Probe object. (Such instrumentation code would typically be placed in the main routine after initializing the ORB outside the actual business objects.)
A Profile specifies filter criteria used by a Probe while collecting messages. It's a dynamically configurable filter that can be uploaded to a Probe residing within a running program (see Figure 7) and it serves three purposes:
- It scopes the traffic being observed to include only the interfaces, methods and parameters of interest.
- It indicates how many times a particular method should be observed.
- It specifies any or all of the four possible communication points to capture data, i.e., SendRequest, ReceiveRequest, SendReply and ReceiveReply.
A Profile might specify, for example, "Monitor up to 20 invocations of the method make_deposit at the ReceiveRequest and SendReply points." The CORBA interface repository is useful for a Profile editor or similar tool to determine details about available object interfaces. This allows you to create or modify a Profile, which you can then upload to a Probe within a running process.
A collector serves as the registration point in the monitoring architecture for the local application programs. A Probe writes intercepted IIOP messages for subsequent retrieval by the Collector using the fastest possible mechanism and format so that the Probe doesn't become a bottleneck by blocking the message traffic. As the data is written, the Collector reads the messages and transmits them to a primary collection point (see Figure 8). A Collector may also transmit additional relevant data about monitored processes to the primary collection point.
The Observer is the primary registration and collection point for all Collectors across the distributed environment. As data is transmitted to the Observer, it's written to a global database viewable in real time to permit the analysis of system performance (see Figure 9).
The information about each message captured includes:
Figure 10 illustrates a simple captured message.
- Sequence number of the request
- Name of the interface containing the operation or attribute
- Name of the operation or attribute in the request
- Amount of time the server took to process the request
- Amount of time spent by the request on the wire
- Total time between the request being sent and reply being received (server time + travel time)
- Parameters in the request at each of the communication points
- System process IDs and names of the hosts for the server and the caller
- Timestamps at each of the communication points
- Details about any thrown exception including the communication point where the exception was raised
Summary: What You Can Do
Gaining insight into distributed system behavior can be complicated, but by performing application-level monitoring during a project's development and test phases you can uncover problems prior to deployment and thereby ensure that your application is reliable. Then, when it subsequently goes live, you can capture communication details to allow performance analysis and quick troubleshooting of unexpected failures. Monitoring and diagnostics in your CORBA system can be achieved using a commercially available tool such as Segue Software's SilkObserver or by building custom instrumentation into your application. Either way, differing techniques may be applied depending on which ORBs you're using and what the specific monitoring objectives are for your system.
The OMG's Test Special Interest Group is in the process of standardizing distributed instrumentation and control for CORBA systems. (A Request for Proposals is being drafted as of this writing and will be issued this summer.) The group is defining a common set of instrumentation capabilities for use in the management, debugging and profiling of multivendor CORBA-based systems. These capabilities will likely rely on the OMG's anticipated standard for portable interceptors. The Test SIG's efforts should yield an interface specification for controlling object execution, returning state information and providing other useful functions that today are performed inconsistently if at all across CORBA implementations. When complete, the OMG's efforts in this area should result in consistent mechanisms for distributed monitoring and diagnostics while using the ORB of any vendor. You can check out the progress of the Test SIG on the OMG Web site
(www.omg.org) or by sending an e-mail to
Todd Scallan is the vice president in charge of Segue Software's
California Development Laboratory. He holds a BS in electrical engineering from Lehigh University and an MS in computer engineering from