HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML

Like it or not, the majority of Java developers use a relational database (RDB), such as Oracle, Sybase, or MySQL, to persist their objects. That's reality for most people, so let's deal with it.

What's the big deal about storing Java objects in relational databases? Something called the object-relational impedance mismatch. The object paradigm is based on proven software engineering principles for building applications out of objects that have both data and behavior, whereas the relational paradigm is based on proven mathematical principles for efficiently storing data.

The impedance mismatch comes into play when you look at the preferred approach to access: with the object paradigm you traverse Java objects via their relationships whereas with the relational paradigm you duplicate data to join the rows in tables. This fundamental difference results in a nonideal combination of Java and RDBs, although when have you ever used two different technologies together without a few hitches? One of the secrets of success for mapping Java objects to relational databases is to understand both paradigms - and their differences - and then make intelligent tradeoffs based on that knowledge.

In this article, I present strategies for storing Java objects in RDBs. These strategies cover the following topics:

  • Use persistent object IDs
  • Map an attribute to zero or more columns
  • Map a class to zero or more tables
  • Map inheritance structures
  • Use one data entity for an entire class hierarchy
  • Use one data entity per concrete class
  • Use one data entity per class
  • Implement associations in relational databases
  • Encapsulate persistence code
Strategy 1
Use Persistent Object IDs

The first thing you need to do is get a handle on the key strategy for your database, a likely source of contention between Java developers and database administrators because objects force you to rethink some of the precepts of data modeling's approach to assigning keys.

To store data in an RDB, you need to assign unique identifiers to the data rows representing your object to be able to identify them. In relational terminology, a unique identifier is called a key; in object terminology, it is called a persistent object identifier (POID) or simply an object identifier (OID). POIDs are often implemented as full-fledged objects in your Java applications, for example, the EJB specification enforces this concept in its primary key class for entity beans, and as large integers or strings in your database tables.

A critical issue that needs to be pointed out is that POIDs should have absolutely no business meaning whatsoever, which goes against common data modeling practices. Any table column with a business meaning can potentially change. And what's the one thing we learned as an industry over the years in the relational world?

It's a fatal mistake to give your keys meaning. No doubt about it.

If your users decide to change the business meaning (perhaps they want to add some digits or make a number alphanumeric), you need to make changes to your database in every single spot where you use that information. Anything that is used as a primary key in one table is likely used in other tables as foreign keys. What should be a simple change (adding a digit to your customer number), can be a huge maintenance nightmare. Yuck. In the relational database world, keys without business meaning are called surrogate keys.

POIDs allow you to simplify your key strategy within a relational database. Although POIDs do not completely solve the navigation/traversal issue between objects, they do make it easier. You still need to perform table joins (assuming you don't intend to traverse, to read in an aggregate of objects such as an invoice and all of its line items), but at least it's doable.

Another advantage is that the use of POIDs enables you to easily automate the maintenance of relationships between objects. When all of your tables are keyed on the same type of column(s), in this case POIDs, it becomes straightforward to write generic code to take advantage of this fact.

A POID should be unique within a class hierarchy, and ideally unique among all objects generated by your organization (something often called global uniqueness). For example, will the POID for a customer object be unique only for instances of customer, to people in general, or to all objects? Given the POID value 74656, will it be assigned to a student object, a professor object, and a seminar object? Will it be assigned to a student but not a professor (because the Student class and the Professor class are in the same class hierarchy)? Or will it only be assigned to a student object and that's it?

The real issue is one of polymorphism: it is probable that a student object may one day become a professor object, but likely not a seminar object. To avoid this issue of reassigning POIDs when an object changes type, you at least want uniqueness at the class hierarchy level, although uniqueness across all classes completely avoids this issue. This can be another point of contention between Java developers and data professionals - polymorphism is an important concept in the object world but not the data world.

There are several ways that you can generate values for POIDS:

  1. Use the MAX() function (and add 1) on the POID column.
  2. Maintain a separate table for storing the next value of a key.
  3. Use Universally Unique Identifiers (UUIDs) from the Open Software Foundation.
  4. Use Globally Unique Identifiers (GUIDs) from Microsoft.
  5. Use proprietary database essential generation functions.
  6. Use the HIGH/LOW approach. See the article "Enterprise Ready Object IDs" (www.sdmagazine.com/articles/1999/9912/9912p/9912p.htm)
Strategy 2
Map an Attribute to Zero or More Columns

An attribute of a class will map to zero or more columns in a relational database. It's important to remember that not all attributes are persistent. For example, an Invoice Java object may have a grandTotal attribute that is used by its instances for calculation purposes but is not saved to the database. To complicate matters, some attributes of an object are objects in their own right, which in turn need to be mapped to your database. For example, a course object has an instance of TextBook as an attribute, which maps to several columns in the database.

The important thing is that this is a recursive definition. At some point, the attribute will be mapped to zero or more columns. It is also possible that several attributes could map to one single column in a table. For example, a class representing an American ZIP code may have three numeric attributes, one representing each of the sections in a full ZIP code, whereas the ZIP code may be stored as a single column in an address table.

Strategy 3
Map a Class to Zero or More Tables

Classes map to tables, although often not directly. Except for simple databases, you will never have a one-to-one mapping of classes to tables. In the following sections, I discuss three strategies for implementing inheritance structures to a relational database and an example where dissimilar classes map to one table.

Strategy 4
Map Inheritance Structures

The concept of inheritance throws in several interesting twists when saving Java objects into an RDB. There are three fundamental solutions for mapping inheritance into a relational database, and to understand them I discuss the trade-offs of mapping the class diagram presented in Figure 1. To keep the issues simple I have not modeled all of the attributes of the classes, nor have I modeled their full signatures, nor any of the methods of the classes.

Figure 1
Figure   1:

Use One Data Entity for an Entire Class Hierarchy
With this approach you map an entire class hierarchy into one data entity, where all the attributes of all the classes in the hierarchy are stored. Figure 2 depicts the persistence model for the class hierarchy of Figure 1 when this approach is taken. Notice how a personPOID column was introduced for the primary key of the table - I will use POIDs in all of the solutions to be consistent and to take the best approach that I know of for assigning keys to data entities.

Figure 2
Figure   2:

The advantages of this approach are that it is simple, that polymorphism is supported when a person changes roles, and that ad-hoc reporting is made easy because all of the data you need about a person is found in one table.

The disadvantages are that every time a new attribute is added anywhere in the class hierarchy a new attribute needs to be added to the table. This increases the coupling within the class hierarchy. If a mistake is made when adding a single attribute, it could affect all the classes within the hierarchy and not just the subclasses of whatever class got the new attribute. It also potentially wastes a lot of space in the database and, in the case of some databases, forces you to rebind your code (by recompiling it) to the new data schema.

I also needed to add the objectType column to indicate if the row represents a student, a professor, or another type of person. This works well when someone has a single role but quickly breaks down if he or she has multiple roles (i.e., the person is both a student and an professor).

Use One Data Entity per Concrete Class
With this approach, each data entity includes both the attributes and the inherited attributes of the class that it represents. Figure 3 depicts the persistence model for the class hierarchy of Figure 1 when this approach is taken. There are data entities corresponding to each of the Student and the Professor classes because they are concrete, but not Person because it's abstract (indicated by the fact that its name is depicted in italics). Each of the data entities was assigned a primary key, studentPOID and professorPOID, respectively.

Figure 3
Figure   3:

The main advantage of this approach is that it is still fairly easy to do ad-hoc reporting because all of the data you need about a single class is stored in only one table. There are several disadvantages, however.

First, when you modify a class, you need to modify its table and the table of any of its subclasses. For example, if you were to add height and weight to the Person class, you would need to add it to both tables - a lot of work.

Second, whenever an object changes its role, perhaps you hire one of our graduating students to become a professor, you need to copy the data into the appropriate table and assign it a new POID - once again a lot of work.

Third, it's difficult to support multiple roles and still maintain data integrity. For example, where would you store the name of someone who is both a student and a professor?

Use One Data Entity per Class
With this approach, you create one table per class, the attributes of which are the POID and the attributes that are specific to that class. Figure 4 depicts the persistence model for the data model for Figure 1 when this approach is taken. Notice how personPOID is used as the primary key for all three data entities. An interesting feature of Figure 4 is that the personPOID column in both Professor and Student is assigned two tagged values, one indicating that it forms both the primary and foreign keys for those tables.

Figure 4
Figure   4:

The main advantage of this approach is that it conforms well to object-oriented concepts. It supports polymorphism because you have records in the appropriate tables for each role that an object might have. It's also easy to modify superclasses and add new subclasses as you merely need to modify/add one table.

There are several disadvantages to this approach. First, there are many tables in the database, one for every class (plus tables to maintain relationships). Second, it takes longer to read and write data using this technique because you need to access multiple tables. This problem can be alleviated if you organize your database intelligently by putting each table within a class hierarchy on different physical drive volumes. Third, ad-hoc reporting on your database is difficult, unless you add views to simulate the desired tables.

Strategy 5
Implement Associations in Relational Databases

Relationships in relational databases are maintained through the use of foreign keys. A foreign key is a data attribute(s) that appears in one table that may be part of - or is coincidental - with the key of another table. Foreign keys allow you to relate a row in one table with a row in another. To implement one-to-one and one-to-many relationships you merely have to include the key of one table in the other table.

In Figure 5 you see four tables, their keys (POIDs of course), and the foreign keys used to implement the relationships between them. First, there is a one-to-one association between the Position and Employee data entities. A one-to-one association is one in which the maximums of each of its multiplicities are one.

Figure 5
Figure   5:

To implement this relationship I used the attribute positionPOID, the key of the Position data entity, in the Employee data entity. I was forced to do it this way because the association is unidirectional - employee rows know about their position rows but not the other way around. Had this been a bidirectional association, I would have needed to add a foreign key called employeePOID in Position as well.

Second, I implemented the many-to-one association (also referred to as a one-to-many association) between Employee and Task using the same sort of approach, the only difference being that I had to put the foreign key in Task because it was on the many side of the relationship.

The PositionBenefit table is interesting because it is an associative table, one that implements a many-to-many association between the Position and Benefit tables. Unlike Java, relational databases cannot natively implement many-to-many associations. Instead they need to be resolved with an associative table that implements the many-to-many as two one-to-many associations.

Strategy 6
Encapsulate Persistence Code

There are several ways you implement the persistence code within your software. The most common, and least palatable in my opinion, is to embed SQL statements in your classes. Listing 1 shows an example of how to persist the Order class; similar code would be needed to persist an instance of OrderItem, code that frankly isn't bulletproof. (I'll leave that as an exercise for you, please don't e-mail me.)

The advantage of this approach is that it allows you to write code quickly and is a viable approach for small applications and/or prototypes. There are two main disadvantages. First, it directly couples your business classes with the schema of your relational database, implying that a simple change (such as renaming a column or porting to another database) results in a rework of your source code. Second, it forces you to write significant amounts of SQL code, at least four statements for simple CRUD (create, retrieve, update, and delete) operations let alone the additional code required to support finding multiple instances based on defined criteria.

A slightly better approach is when the SQL statements for your business classes are encapsulated in one or more data classes. Listing 2 shows how this code would be invoked from the Order class (the code for the OrderData class would be similar to that in Listing 1). As you can see, there is significantly less code in the business objects, although you would still have the data classes to develop and maintain. Once again, this approach is suitable for prototypes and small systems of less than 40 to 50 business classes, but it still results in a recompilation (of your data classes) when simple changes to the database are made.

Your data classes are typically implemented as normal Java classes with embedded SQL, one data class for each business class (e.g., Employee has a corresponding EmployeeData class) or as a collection of stored procedures in your database (one or more for retrieving, one or more for deleting, and so on). The best thing that can be said about this approach is that you have at least encapsulated the source code that handles the hard-coded interactions in one place - the data classes.

A third approach is to use a persistence framework (Table 1 lists several for Java) that persist objects in RDBs in such a manner that simple changes to the relational schema often do not affect your object-oriented code. They do this by storing your mapping information in metadata, and then using that metadata to generate the source code needed to persist your Java objects. When your object or data schema changes you merely need to update your metadata instead of your source code.

Table 1

The advantage of this approach is that only the person maintaining the metadata needs to understand the two schemas; your application programmers don't and, in fact, they don't even need to know that their objects are being stored in an RDB.

The main disadvantage is the misconceptions regarding performance that many Java developers have with respect to persistence frameworks, assuming they can achieve the best performance in their code by hardcoding SQL statements into it. This may be true if they happen to be experts at writing high-performance database access code, but this is seldom the case.

Yes, in theory, persistence frameworks add a bit of overhead to your Java code when compared to exceptionally well-written code. However, the reality is that the people who build persistence frameworks specialize in high-performance database access - they know a lot of tricks that you don't. My suggestion is to approach persistence frameworks with an open mind and to try them out. When you do so, I suspect you'll be pleasantly surprised at how well they actually work - as well as at the significant time and cost savings that they provide.

Java and Relational Databases Are Here to Stay
Storing your objects in relational databases is a reality for the vast majority of Java developers. In this article I presented some basic techniques for doing so, scratching the surface of this complex topic. The strategies I presented have been proven in practice, in fact they are fundamental strategies that you'll be able to apply for years to come regardless of the language you are working with.

I've personally used them on C++, Smalltalk, and Java projects. If you are interested in this topic, I highly suggest reading my new book The Object Primer 2/e, in which I show how to build business applications from end to end. This is done from the point of view of a developer, starting with requirements and then moving through analysis to design and finally into implementation using Java and relational databases on the back end.

Table 2

Recommended Reading

  1. Ambler, S.W. Building Object Applications That Work: Your Step-by-Step (1998). Handbook for Developing Robust Systems with Object Technology. SIGS Books/Cambridge University Press. www.ambysoft.com/buildingObjectApplications.html
  2. Ambler, S.W. "Enterprise Ready Object IDs." (1999). www.sdmagazine.com/articles/1999/9912/9912p/9912p.htm
  3. Ambler, S.W. (2001). The Object Primer 2nd Edition: The Application Developer's Guide to Object Orientation. New York: Cambridge University Press. www.ambysoft.com/theObjectPrimer.html.
  4. Szyperski C. C (1998). Component Software: Beyond Object-Oriented Programming. New York: ACM Press.
  5. UUID Class Javadoc. http://jigsaw.w3.org/Doc/Programmer/api/org/w3c/util/UUID.html
  6. Vermeulen, A., Ambler, S.W., Bumgardner, G., Metz, E., Misfeldt, T., Shur, J., & Thompson, P. (2000). The Elements of Java Style. New York: Cambridge University Press. www.ambysoft.com/elementsJavaStyle.html

Author Bio
Scott W. Ambler is the president of Ronin International (www.ronin-intl.com) and thought leader of the Agile Modeling (AM) methodology (www.agilemodeling.com). He is the author of The Object Primer 2/e and co-author of The Elements of Java Style, both from Cambridge University Press. He is also author of the forthcoming Agile Modeling and co-author of the forthcoming Mastering EJB 2/e, both from John Wiley & Sons. In his spare time, Scott studies Goju karate, Kobudo karate, and Tai Chi. [email protected]

	


Listing 1: Embedded SQL Code in the Order Class

    /**
     * Save an order and its aggregate order Items
     */
    private void save(Connection connection) throws SQLException {


        PreparedStatement orderStatement = null;
        PreparedStatement orderItemStatement = null;
        Vector items;


        //  Build statements to either insert or update
        if (isPersistent() ) {
            orderStatement = connection.prepareStatement
			( "Update INTO Order VALUES(?,?,?,?,?,?,?)");
            orderItemStatement = connection.prepareStatement(OrderItem.getUpdateSQL());
        } else {
            orderStatement = connection.prepareStatement
			( "INSERT INTO Order VALUES(?,?,?,?,?,?,?)");
            orderItemStatement = connection.prepareStatement(OrderItem.getInsertSQL());
        }


   //  Add the order information from this object
        orderStatement.setInt(1, getOrderID());
        orderStatement.setInt(2, getCustomerNumber());
        orderStatement.setDate(3, getOrderDate());
        orderStatement.setDouble(4, getFederalTax().getNumber());
        orderStatement.setDouble(5, getStateTax().getNumber());
        orderStatement.setDouble(6, getLocalTax().getNumber());
        orderStatement.setDouble(7, getSubtotal().getNumber());


        //  Save the order and order contact information
        orderStatement.executeUpdate();
        orderStatement.close();


        //  Save the order items
        items = getOrderItems();
        for ( int i = 1; i <= items.size(); i++ ) {
            OrderItem item = (OrderItem) items.elementAt(i);
            item.save(orderItemStatement, getOrderID(), i);
        }
        orderItemStatement.close();


    }
    /**
     * Refreshes this object with the data for
     * the order with the given id.
     */
    public void retrieve(Connection connection)
        throws SQLException
    {
        OrderContact ship, bill;
        int shipToID, billToID;


        // Retrieve the record from the Order table


        PreparedStatement statement =
        connection.prepareStatement("SELECT * FROM Order WHERE OrderID = ?");
        statement.setInt(getOrderID());
        ResultSet rs = statement.executeUpdate();
        rs.next();


        setCustomerNumber(rs.getInt(2));
        setOrderDate(rs.getDate(3));
        setFederalTax(rs.getDouble(4));
        setStateTax(rs.getDouble(5));
        setLocalTax(rs.getDouble(6));
        setSubtotal(rs.getDouble(7));
        setIsPersistent(true);


        // Read the records from the OrderItem table
        try {
            Vector items = OrderItem.retrieveOrderItems(orderID);
            setOrderItems(items);
        } catch ( Exception ex ) {
            //  Handle the exception appropriately...
        }
    }


    /**
     * Delete the order and Its order Items
     */
    private void delete(Connection connection)
    throws SQLException {


        PreparedStatement ps = null;
        Vector items;


   ps = connection.prepareStatement( "DELETE Order WHERE OrderID = ?");


        ps.setInt(1, getOrderID());
        ps.executeUpdate();
        ps.close();


        //  Add deletion of the order items to the transaction
        items = getOrderItems();


        for ( int i = 1; i <= items.size(); i++ ) {
            OrderItem item = (OrderItem) items.elementAt(i);
            ps = item.delete(connection);
            ps.executeUpdate();
        }
    }



  Listing 2: Delegating Persistence of Order to OrderData
 
  /**
     * Save an order and its aggregate order Items
     */
    private void save(Connection connection) throws SQLException {
        OrderData.save(this, connection);
    }


    /**
     * Refreshes this object with the data for
     * the order with the given id.
     */
    public void retrieve(Connection connection)
        throws SQLException
    {
        OrderData.retrieve(this, connection);
    }


    /**
     * Delete the order Its order Items
     */
    private void delete(Connection connection)
    throws SQLException {
        OrderData.delete(this, connection);
    }


All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.