This series of articles will walk you through the details and some of
the decisions that must be made when implementing container-managed
persistence in Enterprise JavaBeans.
Of course, there is the usual discriminator. These articles are not
based primarily on the EJB specification and what you can and cannot do with
EJBs; instead, they concentrate on information derived from hard-earned
experience you'll find useful when dealing with EJBs.
What's a Primary Key?
The primary key in an EJB is the subset of its attributes, which are
guaranteed to be unique. Informing the container of the contents of an
entity's primary key allows it to store and later, using the PK, retrieve
the same entity. Primary keys always provide a handle to the entity,
regardless of whether it's in memory or storage.
There, that sounded abstract enough, on to reality. Persistence
mechanisms in EJB containers, at least those that are efficient and widely
accepted, are closely tied to databases. Furthermore, although there are a
few object-oriented databases in the market, their acceptance is limited
compared to their relational ancestors.
In essence, relational databases manage reads, writes, and searches on
tables that are made up of columns and rows. Entity beans map cleanly to
tables; each column maps to an attribute; each row maps to an entity. This
is not true in coarse-grained approaches where one entity may be responsible
for multiple rows in multiple tables, but that approach is no longer a
necessity due to performance gains made in EJB 2.0's handling of a finely
grained object model.
What's a Good Primary Key?
Choosing which unique part of an entity's attributes the primary key
should be composed of is not an easy task. It gets exponentially harder to
guarantee amid changing requirements. For example, the first and last name
attributes in an entity modeling Employees might be considered unique at
design time, but this might not hold true in the long run. Primary keys that
are subsets of their attributes are troublesome because uniqueness tends to
fade as data accumulates and new attributes are added to the entity.
At times, adding new attributes can mean adding a new differentiator,
which must be factored into the logical primary key of the entity. As a
result, earlier guarantees of uniqueness are no longer valid. If we were to
add a middle initial attribute to the example entity we used earlier, it
would have to be added to the primary key, resulting in a lot of
refactoring. Figure 1 shows the difference adding an attribute to an entity
can have on both a single and compound primary key.
Figure 1
Although it's possible to use a string instead of an integer, this
approach has several problems, such as slightly slower performance when
doing lookups based on strings, string concatenation not being an effective
way to extend the primary key, and the fact that containers support only
autoincrementing integral primary keys.
Another issue with multiattribute primary keys arises when working with
some container-managed relationships. When dealing with a many-to-many
relationship between two entities, the underlying database table that's
modeling this relationship consists of columns that match the primary keys
from both entities. If the primary keys of both entities are compound and a
common attribute name is shared between them, the database layer cannot
differentiate between them at the column level.
Because these hard lessons have been learned multiple times over, using
an automatically generated integer is something I would highly recommend.
Since it's autoincremented by the container as the primary key for an
entity, not only is it the easiest to implement, it's also the most flexible
over time. The only caveat to using automatically generated integral primary
keys is the container cannot enforce uniqueness. If we were to create three
different entities with the same attributes and used an autoincremented
integer as the primary key, the container would not complain about
duplication since the autoincremented integer primary key would still be
unique. In some cases, this may be valid and duplicates of the logical
primary key might be supported by the business logic, while other scenarios
might not allow duplication.
Enforcing Logical Primary Keys in the Database
One way to avoid this pitfall is to add constraints to the database that
don't allow this to happen. Even though it makes the existence of a database
underneath the persistence layer visible, ruining encapsulation, it
leverages what databases do much better than EJB containers: it keeps track
of data. A quick detour through database constraints from an EJB perspective
might be helpful.
Although most EJB containers are able to generate the underlying
persistence schema, very few people use it directly in production, mostly
because the tables created by the container contain no constraints. Not only
are constraints important in terms of disallowing bad data, they also
provide important performance hints to the database. All relational
databases are able to define the columns in a table that make up the primary
key. They also enable us to define unique indexes. In case you're wondering,
primary keys are specialized unique indexes.
If logical uniqueness is not enforced on the entity layer via a
multiattribute primary key, enforcing it on the database layer is a useful
and effective method of guaranteeing uniqueness. Since both the logical
primary key and the autoincrement attribute need to be uniquely independent
of each other, defining the primary key on the database level to be logical
and defining a unique index for the actual autoincremented primary key
achieves a constraint that disallows duplicate entries, and an index that
allows for fast lookups when searching by the actual primary key.
The programmatic alternative is to create an entity finder based on the
logical primary key, then check for the nonexistence of an entity every time
before creating it, thus guaranteeing that no duplicates are generated.
Summary
In an evolving marketplace, business requirements keep changing. As
promises of matchlessness weaken to assurances and less, and modeled
entities take on more and more properties, something as important as the
primary key of an entity should be as constant as possible. This can best be
achieved by a single primary key that's autogenerated, an integer, and not
dependent on the portion of the entity that's sure to change over time.
Author Bio
Saad Rehmani is senior software engineer at a small startup that does big
things. His current responsibilities include extensive work with J2EE in general and EJB 2.0 in particular. Before realizing how awesome Java was, Saad was heavily involved in various projects ranging from kernel modules to pseudo-realtime state propagation
between clusters.
bonga@aitchisonians.org