HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML
 

Need to store your Java objects? Files can do this, with a little bit of programming to flatten them. Need to share them with others, guarantee integrity? Traditional DBMSs can do this, if you translate your Java objects to SQL. Need 24x7, scalability, distribution over WANs, flexibility for schema changes? ODBMSs can do this, and they can do it easily, by automatically making your Java objects persistent. We'll present the basics of object databases and contrast them with relational and object-relational; explain how to determine if your application is a good fit for ODBMSs; how to deal with legacy issues and how to use ODBMSs with Java and the Web. Examples, chosen from over 150,000 users in production, are included.

Where to Store Shared Information
Many software systems and applications deal with information which must outlive the process. There are many ways to save such persistent information. Broadly, these consist of file systems, traditional database management systems (DBMSs) and object DBMSs (ODBMSs).

File systems support the basic need of persistence. Information is still there after the process terminates and this information can be accessed later. Beyond this, files offer little and they do require work. The programmer must somehow flatten his Java objects into streams of primitive data, and then manually write those streams to files. The reverse is necessary to access the information later. Any changes in the object types will likely require changes in this flattening code, in the file format and perhaps in applications that use it. Any concurrent access control is up to the application programmers or conflicts will result. Files are useful when you have a small amount of information, which is unlikely to change much, accessed by only a single user (at a time), with no need for reliability features such as recovery or usability features such as relationships, distribution and versioning.

The next choice for persistence is to use a traditional DBMS, such as a relational one (RDBMS). These systems have been very successful in business applications which use very simple, primitive, fixed-length data types, organized in tables. They add support for concurrency so multiple users can access the same information without destroying each other's work. They also add recovery, so the stored information can be restored to a known, good state even after power outage or other catastrophic failures. They add powerful searching (or query) capabilities. Unfortunately, RDBMSs were designed for a different generation of software technology in which users dealt with raw (unencapsulated) data, third generation programming languages (COBOL, FORTRAN, C) and a data-specific language (SQL), with the programmer manually translating back and forth between the two. With objects, this means the programmer must translate his objects to flat, primitive types and sort them by tables. Then, when restoring the objects from the RDBMS, the programmer must reassemble the objects from various tables, using slow inter-table connections called joins. This mapping code results in three problems:

  • Programmer Time: Instead of writing mapping code, programmers could be developing (and maintaining) more applications. It is not uncommon to see a third of an object application dedicated to this mapping code.
  • Integrity: Because the RDBMS deals only with the low-level primitives, all higher-level application object support, including methods that maintain their semantics and integrity, are unknown to the RDBMS. Instead, applications must enforce such integrity constraints by translating the data into objects and using the object methods. If different applications do this differently they will be out of synch, causing integrity violations. End user graphical tools (forms, reports, query tools) go directly to this primitive level and thereby bypass any of the object constraints, losing integrity.
  • Performance: At runtime, the need to disassemble and reassemble objects takes substantial time, slowing the application.
ODBMSs include the capabilities of traditional databases, but add several new ones. First, they support objects. The very same objects you define and create in Java are transparently managed by the ODBMS, including saving them on disk, recovering from failures and coordinating concurrent access. This means there is no need for the mapping code (described previously) with all of its problems. It also means that all the DBMS capabilities, including recovery, concurrency, and query with object methods, apply directly to objects rather than to the primitive, disassembled pieces of objects. Because all access to the ODBMS goes to the objects themselves, they can automatically enforce integrity. Even graphical ODBC tools can be forced (using security restrictions, by user and group) to go through high-level object methods in order to maintain integrity and most ODBMSs.

Table 1

Where RDBMSs have mainframe-like central-server architectures in which all storage and processing occurs on a centralized machine, certain ODBMSs have been developed with distributed architecture. This allows objects to live on any computer (accessible in the networked environment), to execute anywhere and to be accessed transparently by all users, with all operations working across this distributed single logical view. In addition, objects make a natural unit for replication, and implementations now exist that keep replicas in synch even across failures. The distributed ability to support transparently adding servers is a major part of scalability, and other capabilities have been added, too, including concurrency modes that support multiple readers and up to one writer running simultaneously without blocking.

Also, ODBMSs bring new features. The ability to define many-to-many relationships allows the ODBMS to generate and manage the code to maintain such relationships, to dynamically add and remove elements and to maintain referential integrity, all without the need for users to write code or manually manage secondary data structures such as foreign keys. Moreover, traversal of such relationships is direct, without the need to search down tables and compare keys as is done in the relational join. By connecting networks of objects with these relationships, users can construct composite objects, which allow any number of levels of depth, and also any number of composites threading through a single object. Objects also provide the natural unit for versioning, keeping track of the history of the object's state, or even allowing simultaneous creation of multiple branches. Finally, since these features are in the DBMS, rather than layered over some simple data-only store, the ODBMS can integrate them together to properly handle complex object's models; e.g., recovery of composites and relationships and the behavior of relationships when one of the objects versions, etc.

In brief, ODBMS brings the advantages of files and traditional DBMSs, and also adds support for objects and additional features.

What about ORDBMS?
Faced with customer requests for object support, the RDBMS vendors have come up with an approach called object-relational or ORDBMS. To understand this mixed approach, we'll look at the high-level architectural description shown in Table 2.

Table 2

A DBMS architecture can be split into the front end, which interfaces to the user, and the back end, which stores and retrieves the persistent information. Either of these may be based on either relational or object technology, providing the four alternatives shown. The first, with relational front and back ends, gives a typical RDBMS, while the second does the same for ODBMS. The third shows an object front end layered over a relational back end engine. This is the approach of the RDBMS vendors, largely because they have a large investment in their back ends and it's very hard to change them. Adding the object front end does add value; e.g., it might allow better integration with some object tools and it might allow some new data types. However, the back end is still relational, which means the objects are still being disassembled into flat tables, or BLOBs, whose internal structure is unknown to the rest of the DBMS. Some ORDBMSs are adding data "blades" or "cartridges" which are effectively pre-built class libraries. Unfortunately, they miss the point of objects by dealing only with data. Also, they require kernel modifications, so they are hard for typical users to build, or even modify. In contrast, ODBMSs allow users to freely build any classes of objects, with any operations and relationships and to freely extend others' classes. All of these can be used in exactly the same ways as any pre-built classes.

For completeness, the last column of Table 2 shows how a relational technology front end (including query and ODBC) can be layered on top of an object database back end. This not only adds functionality, including ad hoc query of objects and off-the-shelf use of all the familiar tools, but also plays a key role in legacy support, as we'll see below.

When to Use an ODBMS
If your information needs to include any of the following, a DBMS is likely to help:

  • Recovery
  • Concurrency
  • Integrity Management
  • Scalability
  • Security
An ODBMS may well be a better tool for maintaining your persistent information if any of the following three items apply to your system:
Object Usage
If your application or system is designed and built using objects, that in itself might make an ODBMS a better choice. It means the same Java objects are directly managed by the ODBMS. There's no need to translate them to and from some other format (tables, records, etc.). This makes it easier to build and modify the system, faster to execute, of higher integrity and more likely that the system will come together correctly because the same modeling abstractions are used throughout.

Complex, Interconnected Information
While RDBMSs work well for flat, fixed-length primitive data, sorted by type into tables, many applications require more. In fact, most applications never used traditional DBMSs, and newer applications are using yet more complex information, including variable-sized structures (e.g., time series data), nested structures, images, audio, video and whatever someone might dream up tomorrow. All these are modeled directly as objects, making them easier and faster to use. Even more important for some users are the relationships. RDBMSs have no direct support for these, requiring users to create secondary data structures (foreign keys) and manage them directly. Worse, the RDBMS uses a slow, non-scaling, search-and-compare process (join) to determine what is connected to what at runtime. The direct ODBMS support is much easier, faster and includes more capabilities. A common rule of thumb is: If you have more than three or four joins, it's worth looking into an ODBMS.

Distributed Environment
Traditional, relational and even some object DBMSs are built around mainframe-like central-server architectures. For some applications, this works well. For others (more and more these days), the deployment environment consists of multiple servers, workstations and PCs, often with more computing power scattered around the network and desktops than is contained in the central computer. The users wish to store objects anywhere, access them from anywhere, execute them anywhere, including multiple tiers. An ODBMS can do this much better with a distributed architecture, using objects as the natural unit for distribution and object identity as the basis for transparently locating objects. This can work over networks of heterogeneous computer hardware, operating systems, networks and compilers. Even separate languages (C, C++, Java, Smalltalk, SQL/ODBC) can be used to simultaneously share, access and modify objects, a key capability for object technology because it enables re-use.

Users The earliest users of ODBMSs were those who had no choice, because they simply couldn't use the traditional DBMSs, yet they still had a significant need for persistence of large amounts of information, concurrency, scaling and recovery. These were engineering applications such as CAD/CAE, both mechanical and electronic, and are still users. Scientific applications also are major users. Examples here include CERN, in Geneva, storing the results of high-energy physics experiments (pictured on pages 8 and 9). They're building the world's largest database, 100 PB (a petabyte = 1,000 terabytes = 1,000,000 gigabytes). Similarly, the Sloan Digital Sky Survey (FermiLab, Johns Hopkins, etc.) is building a 40TB database containing the first digital survey of the sky, storing the stars, galaxies, quasars, etc., as objects in the ODBMS.

From there, the user base expanded into Telecommunications, where network management and real-time call routing require the performance, direct relationships, scalability and flexibility of ODBMSs. Examples here include Qualcomm (and their customers Nortel, Sprint, etc.), creators of the CDMA cellular standard, who build all their base stations on an ODBMS. Other examples include Siemens' Multiplexor, Intecom's Voice/Video/Data PBX, COM21's cable TV-based very high-speed modems (up to 1Mbps) and Motorola's Iridium satellite-based world-wide cellular system.

Manufacturing and process control are another major user, with real-time support for controlling distributed environments as well as databases of historical information for off-line analysis and query. Users in this area include Fisher-Rosemount, manufacturing control systems widely used in the petroleum and chemical and pharmaceutical industries; Landis & Gyr, environmental control systems used to maintain the world's busiest airport, Chicago's O'Hare; the Transamerica Pyramid and hospital suites, etc.; and KLA-Tencor, the market leader in semiconductor manufacturing.

Financial services are just now becoming users of ODBMSs, as exemplified by Citibank's currency trading system, deployed across Europe and the USA. Logistics systems such as BBN's Target are used in military and commercial environments, as are transportation systems. Others include document management, library management, healthcare systems, plus the utilities industry where American Meter has built a data collection application for remote meter reading and demand-side management.

The ODMG Java Interface
If you've looked at DBMSs before, you may be surprised to see what it looks like to use an ODBMS. Unlike traditional DBMSs, the ODBMS approach is to integrate the DBMS functionality directly into the host language. For Java, this means you simply define, create and access Java objects normally and the DBMS takes it from there. Of course, there are places where you will want to explicitly use the DBMS; for example, to start and end transactions (for recovery points and points where your work becomes visible to others), to create and access large collections, many-to-many relationships, etc.

The Object Database Management Group (ODMG), a consortium of vendors and users of ODBMSs representing essentially the entire vendor community, has defined standard interfaces to ODBMSs. You can read about their latest work at http://www.odmg.org/ or in the book, "The Object Database Standard, ODMG 2.0," from Morgan Kaufman. The Java binding works with ODMG's Object Definition Language (ODL), and thereby OMG's Interface Definition Language (IDL), as well as ODMG's Object Interchange Format (OIF) and Object Query Language (OQL), which is quite close to, but not exactly like, the SQL2 query (SELECT-FROM-WHERE).

The normal syntax is used within Java to define object types, instantiate objects and access them. Persistence is via reachability, which means that once an object is connected to a persistent object (including "root" objects), it becomes persistent. This is a natural extension for dynamic, garbage-collected languages in which unconnected objects are considered garbage and (eventually) deleted. Objects connected to other transient objects are retained transiently (until the end of the process), while those connected to persistent objects are retained persistently (across processes, until they become garbage). A brief example is shown in Listing 1.

Legacy System Access
It is a rare designer who has no legacy system to deal with. Luckily, ODBMSs provide a couple of very good ways to link new, object systems to older, non-object legacy systems. The two most common approaches are first, based on SQL, and second, based on surrogate objects - both of which can be used if desired.

Since some ODBMSs now fully support SQL and ODBC, these well-known languages may be used to simultaneously access both the objects in the ODBMS and the tables in legacy RDBMSs. Programs written in SQL can access all such systems, as can the familiar graphical tools (Crystal Reports, Microsoft Access and Visual Basic, etc.), almost all of which support ODBC (see Figure 1). The advantage of this approach is that it leverages existing investments in programs, tools and also in personnel training. Experienced database users can immediately access the new (as well as old) databases, starting where they're already familiar, and over time learn more and more about objects in order to get more benefits.

Figure 1
Figure 1:

For the object user, a preferable approach would be to make the legacy systems accessible as objects. This is done by creating surrogate objects, which stand for information in legacy systems. For the major RDBMSs, class libraries to do this can be purchased; for these or other systems, the user can also write his own surrogate methods to read and write legacy information. The result is that these surrogates fit transparently into the distributed, single logical view. When they're accessed, they go off to the legacy systems but, except for performance considerations, they look exactly like any other objects. Although the mapping of tables to objects can be done automatically in a straightforward way, it is usually best to reanalyze the entire system, define the desired view of objects and then bury in the surrogate's methods the translation to any historical structures, so objects might be pieced out of different tables or go through legacy modules as needed to meet the application's and user's functionality. The result is that the new object users have full access to the legacy systems, but the legacy systems themselves continue to work unchanged. Evolution is now possible at the user's discretion and timetable: legacy information can be moved into native objects if and when desired, with no change for object users though of course at that point legacy systems will need to be changed to use the native objects (see Figure 2).

Figure 2
Figure 2:

Conclusion
The path you choose depends on your needs. For batch storing/restoring of a small number of objects, with little concern over speed, flattened streams to flat files work. For concurrency, recovery, backup, etc., go to a DBMS. The most natural, easiest and most efficient DBMS approach is ODBMS, which also can add native Java binding (just code Java and the ODBMS works underneath), performance, scalability, reduced programming cost, extra integrity, relationships, versioning, composites, kernel-level support for extensibility. Some ODBMSs can also add 24x7 support (online administration, garbage collection, schema evolution, etc.), fault tolerance, replication, transparent distribution, heterogeneity (simultaneous use of mixtures of different operating systems, languages, applications and databases).

Finally, unless you like the "bleeding edge," check for references that are successful in production, using the features or capabilities you need.

About the Author
Dr. Andrew E. Wade is the Founder and Vice President of Objectivity, Inc. He helped found both the Object Management Gourp and the Object Database Management Gorup and has co-authored and contributed to several books and written many articles. Objectivity can be found at www.objectivity.com. Drew can be reached at [email protected]

	

Listing 1: Examples of Java use of ODBMS.
  
//a persistent class with a transient attribute  
public class Person {  
public String name;  
transient Something currentSomething;  
...}  

// Opening a database  
public static Database open(String name, int accessMode)  
throws ODMGException;  
public void close() throws ODMGException;  

//standard java code applies for accessing objects  

//example code using collections and OQL queries  
SetOfObject mathematicians;  
mathematicians = Students.query(  
"exists s in this.takes: s.section_of.name = \"math\" ");  

Bag mathematicians;  
Bag assistedProfs;  
Double x;  
OQLQuery query;  
mathematicians = Students.query(  
"exists s in this.takes: s.sectionOf.name = \"math\" ");  
query = new OQLQuery(  
"select t.assists.taughtBy from t in TA where t.salary > $1 and t in $2 ");  
x = new Double(50000.0);  
query.bind(x); query.bind(mathematicians);  
assistedProfs = (Bag) query.execute(); 
  
      
 

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.