Over the past several years EJB technology has entered the
software development mainstream. This new level of recognition and
greater popularity brings an increase in design activities in the EJB
space, such as best practices and design patterns.
Most of the EJB design practices created so far are aimed at
improving the overall performance of EJB-based applications. It turns
out that the majority of these practices were taken directly from
object-oriented development (OO) and moved to the realm of EJB
design, without consideration for the specifics of EJBs. This article
emphasizes these specifics and how they impact the design of EJBs and
EJB-based applications.
What's So Special About EJBs?
EJB technology was introduced as a distributed components
technology. The key to understanding it lies in the meaning of the
words distributed and components. Let's start with distributed, then
examine components.
Distributed Aspects
EJBs are accessed through the Java Remote Method Invocation
(RMI), regardless of whether they're local or remote to the client.
Although some of the application server implementations (e.g.,
WebSphere) optimize local communications to make them faster, most
EJB communications are still network-based. Although the distributed
aspect of communications is transparent to the user in actual method
invocations, it has a profound effect on execution performance. The
situation is further complicated by the fact that actual
communication with the bean is based on interception (see Figure 1)
and is implemented in two steps:
- A request for the bean's method of execution is first sent to
the container in which the bean resides.
- The container fulfills the required intermediate steps
(security, transactions, etc.) and then forwards the request to the
bean.
Figure 1:
For the method on the EJB to be invoked, the remote reference
to the home interface must be obtained. This is usually done through
an additional network call to the Java Naming and Directory Interface
(JNDI). The home interface can then be used to get the actual EJB
reference. These operations introduce additional network calls (see
Figure 2).
Figure 2:
To summarize, the execution method on the EJB is an expensive
network process. Thus having low granularity methods on the EJB
typically lead to poor performance of the overall system.
The introduction of local interfaces in EJB 2.0 is one
attempt to improve overall performance. Local interfaces provide a
way for beans in the same container to interact more efficiently -
calls to methods in the local interface don't involve RMI. Although
the local interfaces represent EJBs in the same address space and
don't use distributed communications (e.g., no RMI between colocated
beans), the container is still involved in every interaction to
provide the required intermediary steps. In addition, even in the
case of a local interface, a networking call to the JNDI is required
for the client to obtain a reference to the local home interface,
through which a reference to the local interface can be resolved. In
reality, the specification doesn't define how vendors must implement
local interfaces since they're only logical constructs and may not
have the equivalent software counterparts. Additional delays can
still be present in local communications.
The only effective way to improve the overall performance of
EJB-based applications is to minimize the amount of method
invocations, making the communications overhead negligible compared
with the execution time. This can be achieved only by implementing
coarse-grained methods.
Component Aspects
To define the component characteristics of EJBs we have to
first define what components are. Although component-based
development (CBD) has been around for at least 10 years, they're
still not clearly defined. In general, components are for
composition. Composition enables the reuse of prefabricated "things"
(components) by rearranging them into ever-new and changing
composites. Beyond this observation there's a lack of consensus on
the definition of a component within the software industry. Microsoft
has even invented the Component Object Model (COM), thus implying in
the name some relationship between components and objects.
The Object Management Group (OMG) has defined distributed
objects and built distributed components on top of them, leading
people to think that components are tightly linked with objects. Many
people assume that components are nothing more than super objects - a
huge misconception. Components are a software implementation of
business artifacts, intended to simplify the creation of business
applications. Objects are software constructs, intended to simplify
code creation; they're not necessarily related to the business
content of an application.
Instead of trying to come up with a precise definition for
components, we'll define the core concepts the industry is using as a
"unified" description of a software component:
- A software implementation of a well-defined application
(business) aspect.
- Should implement a collection of related functions or
services; a relationship is determined by the analysis done from the
perspective of intended usage. The component should provide a
complete but not necessarily exhaustive set of functions.
- Must be identifiable, meaning it can be addressed by another
component, possibly via a network.
- Should be treated as a whole so that it's not necessary to
worry about all its pieces. This requires that components can be
individually designed, developed, and deployed.
- Should separate its interface from the implementation used to
support it. A component might be thought of as a "black box"
implementation of the business construct with a well-defined
interface.
- Component-based development (CBD) is not object-oriented
development. This means that CBD does not necessarily require OO
development. CBD can be implemented with equal success in both OO and
procedural languages. CBD is merely a way of decomposing systems.
It's a way to manage complexity better.
Most people consider the potential for reuse to be the main
driving force for using a CBD approach. To be independently
deployable a component has to be self-contained - separated from its
environment and other components. Coupled with the requirement to
implement well-defined application aspects, this provides the widest
possibility for reuse.
Managing complexity is another major advantage of CBD.
Components allow for the natural decomposition of a complex system
into smaller chunks, which are usually much simpler and easier to
manage. In addition to horizontal partitioning, introduced by layered
architecture, the adoption of components introduces vertical
partitioning.
The description of a component provided earlier does not
specify the internal implementation of the component. This means that
in principle, components can be implemented using lower granularity
components (e.g., IBM's advanced components for WebSphere). This is
similar to the system-analysis paradigm in which large systems are
believed to consist of smaller systems, recursively, until the size
of the system becomes manageable.
This recursive definition lets you think about components as
a unifying concept for the software system as a whole as well as
individually. The introduction of components also forces a multilevel
design: the components and their internals. A compound component is
made up of several components.
The following is a summary of the benefits of CBD:
- Containment of complexity: Using CBD allows for the natural
decomposition of a system. First, create a high-level design of the
components and their interfaces. Then focus your development project
on one or a small number of components. This effectively allows for
the reduction of scope and better risk management of every project.
Besides, smaller and better-focused development teams are usually
more productive.
- Opportunity for massive parallel development: Project
boundaries defined around stable component definitions encourage
parallel development in-house and via outsourcing. The outsourcing of
maintenance may occur as well, since component providers may supply
maintenance for their components.
- "Black box" component implementation encourages flexibility:
A component that supports a well-defined interface can be substituted
with another one that supports either the same interface or one
derived from the original interface. This simplifies modifications to
current behavior and enhances functionality.
- Incremental testing: Components facilitate unit testing and
support progressive build testing.
- Encapsulated components act as firewalls to change: The
ripple effect from change is much smaller, simplifying system
maintenance.
- Greater consistency in usage: Components impose a standard
architecture for applications.
What Does This Mean for
EJB Development?
It's now apparent from our distributed and component
discussion that superior EJB design is very different from OO design.
The problem is that this point was never fully carried across to
developers, many of whom still consider EJB to be a Java class that
adheres to the EJB interface specification. The individual deployment
of EJBs is the only component characteristic supported and emphasized
by the EJB environment.
Simply because of its name, Enterprise JavaBeans, EJB
connotates a relationship with another popular technology from Sun
Microsystems - JavaBeans. To make things worse and confuse people
even more, many popular Java IDEs (e.g., JBuilder) use a single
workspace or "bean tab" for both JavaBean and EJB development, thus
suggesting a strong correlation between the two distinct technologies.
One of the examples of such correlations are setter and
getter methods, which are required by the JavaBean specification to
access internal variables. Setter and getter methods were introduced
by OO practitioners in order to provide access to encapsulated object
variables and eliminate coupling between internal representation and
external access. This practice was blindly moved into EJB
development, after which time many additional patterns - most notably
the Façade and Value Object patterns - were introduced to improve
design performance, which was less than optimal to start with.
Experience has proven that using setter and getter methods in
distributed systems is a bad habit. Further, one of the rules for
distributed computing is the introduction of self-contained method
signatures to minimize network traffic and improve overall
performance, which setter and getter methods rarely embody. The main
characteristics of a self-contained method signature is that it
accepts all the variables required for the method execution and
returns all the results of the execution. In other words,
self-contained methods don't require additional methods for either
setting required data or retrieving results. Furthermore, because
components are an implementation of application (business) artifacts,
the methods that they support are supposed to be meaningful business
methods, which setters and getters rarely are.
Our point is that a single EJB must be a large granular piece
of software that's internally composed of a potentially large number
of Java classes. It has to represent meaningful business artifacts
and support meaningful business methods. This is the only feasible
way of creating high-performance EJB applications with reusable beans.
Impact on Systems Design
The implementation of EJB-based components dictates a new
approach to the design of EJB-based systems. It impacts the
separation of responsibilities between session and entity beans as
well as the design of the beans.
Entity beans are often introduced as persistent data
components (enterprise beans) that know how to persist their own
internal data to a durable storage area such as a database or legacy
system. This definition reduces entity beans mostly to
object/relational mapping and often leads to a design in which entity
beans are used purely as a data access layer (we've even seen a
comparison of entity beans with serializable Java objects, which
serialize themselves into a database). In this approach entity beans
become fairly small, with a one-to-one correspondence between an
entity bean and a database table that leads to a very low granularity
implementation.
This causes not only increased network communications, but
also negatively impacts database communications due to the increased
usage of finder methods. The standard implementation of a finder
method is a database query for the key value. As the number of entity
beans of the same type grows, this lookup, which is a separate
operation from the actual population of the entity bean, becomes more
and more expensive.
Some implementations, for example, WebLogic, allow for the
optimization of finder methods by combining them with the load. This
alleviates the problem somewhat, but is not part of the standard.
Also, as the variety of entity beans grows, the amount of finder
method invocations also grows, making the overall application's
performance even worse.
In addition, the granularity of entity beans has a profound
effect on database design. Prior to the introduction of entity beans
(and the componentization of software in general), database design
was performed for the application as a whole. This usually led to a
database design with a strong emphasis on enforcing data
relationships by supporting entity relationships and multiple
constraints. With the introduction of entity beans (e.g., components)
the situation has to change. Because entity beans are reusable,
individually deployable components, the only thing a database can
enforce is that the relationships within the data are supported by
the individual components (beans). Introducing relationships in data
that's supported by multiple entity beans will break the beans'
autonomy, so it doesn't seem to be a feasible solution.
The relationship between the data of multiple entity beans
must be implemented on a higher level by the session beans as part of
the internal business-process definition. The lower the entity beans'
granularity, the less relationships can be enforced in the database
and the greater the programming effort that's required to support
them.
The last thing to consider here is the fact that business
rules that govern enterprise processing can be divided into two broad
categories:
- Accessing data: These rules govern how data has to be stored
in the database, operations that can be done with this data, and
possible constraints. These rules are usually part of the business
artifact and tend to be very stable and applicable for multiple
implementations both within and between enterprises, and to provide a
high potential for reuse.
- Processing data: These rules govern business processes within
the enterprise. They define both the conditions and the sequence of
the components' execution. They tend to change fairly frequently and
are rarely reusable.
Entity beans must incorporate two major things: persistent
(enterprise) data and business rules that are associated with the
processing of this data. Ideally, entity beans should be viewed as an
implementation of reusable business artifacts and adhere to the
following rules:
- Have large granularity, which usually means they should
contain multiple Java classes and support multiple database tables.
- Be associated with a certain amount of persistent data,
typically multiple database tables, one of which should define the
primary key for the whole bean.
- Support meaningful business methods and encapsulate business
rules to access the data.
A session bean should represent the work being performed for
the client code that's calling it. Session beans are business-process
components that implement business rules for processing data.
Business processes implemented by session beans within the
EJB environment should define business and corresponding database
transactions. It's not advisable to use a client's transactions in
the EJB environment due to potential problems with the long-running
transactions that can cause database lockup. Entity beans that
participate in the transaction are effectively transactional
resources due to their stateful nature. In reality, however,
application server vendors don't treat them as such and basically
"clone" entity beans when more than one user wants to access the same
information. They rely on the underlying database to lock and resolve
access appropriately. Although this approach greatly improves
performance, it provides the potential for database lockup.
At the beginning of a transaction the container invokes a
load method on the entity bean that's performing the database read,
thus acquiring read lock on the set of tables. At this point another
clone of the same bean can acquire the same data and obtain another
read lock. After that first transaction has ended, the container
invokes a save method on the first bean that tries to write data back
to the database. The database would attempt to promote the lock to
the write operation, but would not be able to because there's another
read lock for the same data. As a result a database deadlock would
occur.
The severity of this situation can vary, depending on the
locking mechanism of the database in use and the duration of the
transaction. Either way, it's not a desirable response.
Summary
Our main stipulation in this article is that EJB design is
very different from OO design and it's impossible to blindly apply OO
design principles to EJBs.
A simple example is designing for reuse. In OO systems the
main driver is to reuse code constructs, and the best results can be
achieved by creating objects of very low granularity. In
component-based development and thus EJB development, the main driver
is to create reusable business artifacts, thus components must be of
fairly large granularity.
The creation of coarse EJB components that consist of
multiple Java classes will eliminate much of the network traffic
occurring in today's EJB implementations. It will also allow for two
levels of reuse: traditional OO reuse on the Java classes level that
provides a component's internal functionality, and the component's
reuse on the EJB level.
Acknowledgment
Special thanks to Michael Farrell Jr. and Tung Mansfield for
their contributions to this article.
Author Bio
Boris Lublinsky, regional director of technology at Inventa
Technologies, oversees engagements in EAI and B2B integration and
component-based development of large-scale Web applications. He has
over 20 years of experience in software engineering and technical
architecture.
blublinsky@hotmail.com