HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML

A Practical Solution for the Deployment of JavaServer Pages, by John Goodson

While Java databases have become essential for Web applications, developing performance-oriented Java Database Connectivity (JDBC) applications can be challenging. But by following tried-and-true approaches, it is possible to develop and fine-tune JDBC applications to make them run faster, jump higher, and make fewer trips to the database.

Below, we present practical guidelines for many of the common JDBC applications.

Guideline 1:
Use Metadata With Care
The first guideline is to be careful about using metadata. Although it's almost impossible to write JDBC applications without database metadata methods, you can improve system performance by minimizing their use.

Cache Information to Limit Executions
To return all result-column information mandated by the JDBC specification, a JDBC driver may have to perform multiple queries, joins, and subqueries in order to return the necessary result set for a single call to a database metadata method. Applications should cache information returned from result sets that generate database metadata methods so that multiple executions are unnecessary. For example, call getTypeInfo once in the application and cache the elements of the result set that your application depends on. It is unlikely that an application will use all elements of the result set generated by a database metadata method, so the cache of information should not be difficult to maintain.

Avoid Null Search Patterns
One thing to avoid with metadata is the use of null arguments or search patterns with database metadata methods because this generates time-consuming queries. In addition, network traffic can increase due to unnecessary result set information. To avoid these problems, always supply as many non-null arguments as possible to result sets that generate database metadata methods.

Moreover, because these methods are slow, applications should invoke them efficiently. Many applications pass the minimum number of non-null arguments required for the function to return a successful result set.

For example:

ResultSet WSrs = WSc.getTables (null, null, "WSTable", null);

should be:

ResultSet WSrs = WSc.getTables ("cat1", "johng", "WSTable", "TABLE");

Unfortunately, sometimes you don't know much about the object for which you are requesting information. But any information that the application can send the driver when calling database metadata methods can result in improved performance and reliability.

Use Dummy Queries
Avoid using getColumns to determine characteristics about a table. Instead, use a dummy query with getMetadata.

Consider an application that allows the user to choose the columns. Should the application use getColumns to return information about the columns to the user or prepare a dummy query and call getMetadata? Look at Cases 1 and 2 on the next page for the answer.

Guideline 2:
Retrieve Data Efficiently
Too many calls from the application to the driver can cause your application to run more slowly. To improve performance, try to reduce the size of data that is returned. Also, avoid retrieving long data unless it is absolutely necessary. Retrieving long data across the network is slow and resource-intensive. Moreover, most users don't want to see long data. If a user does want to see these result items, the application can query the database again, specifying only the long columns in the select list. This method allows the average user to retrieve the result set without having to pay a performance penalty of higher network traffic.

While it's best to exclude long data from the select list whenever possible, some applications do not formulate the select list before sending the query to the JDBC driver. (For example, some applications select * from <table name>....) If the select list contains long data, the driver must retrieve that data at fetch time, even if the application doesn't bind the long data in the result set. In this scenario, whenever possible use a method that does not retrieve all columns of the table.

To allow the application to control how long data is retrieved in the application, designers use the getClob and getBlob methods. However, in many cases the JDBC driver emulates these methods due to the lack of true locator support in the DBMS. In these cases the driver must retrieve all of the long data across the network before exposing the getClob and getBlob methods.

Sometimes retrieving long data is necessary. When this is the case, remember that most users don't want to see 100KB, or more, of text on the screen.

Guideline 3:
Reduce the Size of Data Retrieved
One option for reducing network traffic and improving performance is to reduce the size of any data being retrieved to some manageable limit. You can do this by calling setMaxRows, setMaxFieldSize, and the driver-specific SetFetchSize. Another way to reduce the size of the data being retrieved is to decrease the column size. If the driver allows you to define the packet size, use the smallest size that will meet your needs.

Guideline 4:
Use Markers To Call Stored Procedures
One of the most complex tasks in JDBC programming is the handling of stored procedure calls.

JDBC drivers can call stored procedures on the database server either by executing the procedure as any other SQL query or by optimizing the execution by invoking a Remote Procedure Call (RPC) directly into the database server. Instead of using literal arguments, always use parameter markers for the argument markers when calling stored procedures.

Executing the stored procedure as a SQL query results in the database server parsing the statement, validating the argument types, and converting the arguments into the correct data types. Remember that SQL is always sent to the database server as a character string. For example, look at this command: "{call getCustName (12345)}." In this case, even though you might assume that the argument to getCustName is an integer, the argument is actually passed inside a character string to the server. The database server would parse the SQL query, isolate the single argument value 12345, and then convert the string "12345" into an integer value.

By invoking an RPC inside the database server, you avoid the overhead of using a SQL character string. Instead, the procedure is called by name with the argument values already encoded into their native data types. Here are two examples:

Example 1
Stored procedure cannot be optimized to use a server-side RPC. The database server must parse the statement, validate the argument types, and convert the arguments into the correct data types.

CallableStatement cstmt = conn.prepareCall ("call getCustName (12345)");
ResultSet rs = cstmt.executeQuery ();

Example 2
Stored procedure can be optimized to use a server-side RPC. The application calls the procedure by name and the argument values are already encoded, so the load on the database server is less.

CallableStatement cstmt - conn.prepareCall ("Call getCustName (?)");
cstmt.setLong (1,12345);
ResultSet rs=cstmt.executeQuery();

Guideline 5:
Optimize JDBC Performance Through Smart Disk I/O
You can improve application performance by limiting the amount of input/output. Connections and transactions are a good place to start looking for problems.

Managing Connections
Connection management has a direct effect on application performance. So design your application to connect once and use multiple statement objects, instead of performing multiple connections. Also, avoid connecting to a data source after establishing an initial connection.

Although it's a good thing to gather driver information at connect time, don't minimize the effect by connecting twice. For example, some applications establish a connection and then call a method in a separate component that reattaches and gathers information about the driver. Applications that are designed as separate entities should pass the established connection object to the data collection routine instead of establishing a second connection.

Besides connecting twice, another bad practice is to connect and disconnect repeatedly throughout your application to perform SQL statements. Connection objects can have multiple statement objects associated with them. Statement objects, which are defined as memory storage for information about SQL statements, can manage multiple SQL statements.

Pooling is another recommended practice for managing connections. You can improve performance significantly this way, especially for applications that connect over a network or through the Web. Connection pooling lets you reuse connections; closing connections does not close the physical connection to the database. When an application requests a connection, an active connection is reused, thus avoiding the network I/O needed to create a new connection.

Case 1: GetColumns Method

ResultSet WSrc = WSc.getColumns (... "UnknownTable" ...);
// This call to getColumns will generate a query to
// the system catalogs... possibly a join
// which must be prepared, executed, and produce
// a result set
. . .
WSrc.next();
string Cname = getString(4);
. . .
// user must retrieve N rows from the server
// N = # result columns of UnknownTable
// result column information has now been obtained

Case 2: GetMetadata Method

// prepare dummy query
PreparedStatement WSps = WSc.prepareStatement
(... "SELECT * from UnknownTable WHERE 1 = 0" ...);
// query is never executed on the server -
// only prepared
ResultSetMetaData WSsmd=wsps.getMetaData();
int numcols = WSrsmd.getColumnCount();
..
int ctype = WSrsmd.getColumnType(n)
..
// result column information has now been obtained
// Note we also know the column ordering within the
// table! This information cannot be
// assumed from the getColumns example.

In both cases, a query is sent to the server. But, in Case 1 the query must be evaluated and return a result set to the client, so Case 2 delivers better performance.

Managing Commits in Transactions
Committing transactions is extremely slow and disk I/O-intensive. Therefore, always turn Autocommit off by setting WSConnection.setAutoCommit(false). There are two reasons for this.

First, the database server must flush back to disk every data page that contains updated or new data. This is not a sequential write, but a searched write to replace existing data in the table. By default, Autocommit is on when connecting to a data source, and Autocommit mode usually impairs performance because of the significant amount of disk I/O needed to commit every operation.

Second, some database servers do not provide an Autocommit mode. For this server type, the JDBC driver must explicitly issue a COMMIT statement and a BEGIN TRANSACTION for every operation sent to the server. So, in addition to the large amount of disk I/O required to support Autocommit mode, a performance penalty is paid of up to three network requests for every statement issued by an application.

Using transactions can help application performance, but it's not a panacea. Leaving transactions active can reduce throughput by holding locks on rows for long times, preventing other users from accessing the rows. Commit transactions in intervals that allow maximum concurrency.

Guideline 6:
Choose the Right Transaction Model
Many systems support distributed transactions, that is, transactions that span multiple connections. But distributed transactions are at least four times slower than normal transactions due to the logging and network I/O necessary to communicate among all the components involved in the distributed transaction. So, unless distributed transactions are required, avoid using them. Instead, use local transactions whenever possible.

And for the best system performance, design the application to run under a single Connection object.

Guideline 7.
Use getBestRowIndentifier() for Column Updates
Use getBestRowIndentifier() to determine the optimal set of columns to use in the Where clause for updating data. Pseudo-columns often provide the fastest access to the data, and these columns can be determined only by using getBestRowIndentifier().

Some applications cannot be designed to take advantage of positional updates and deletes. These applications typically update data by forming a Where clause consisting of some subset of the column values returned in the result set. Some applications might formulate the Where clause by using all searchable result columns in calling getPrimaryKeys(), or in calling getIndexInfo() to find columns that might be part of a unique index. These methods typically work, but might result in complex queries.

Consider the following example:

ResultSet WSrs = WSs.executeQuery
("SELECT first_name, last_name, ssn, address, city, state, zip
FROM emp");
// fetchdata
..
WSs.executeQuery ("UPDATE EMP SET ADDRESS = ?
WHERE first_name = ? and last_name = ? and ssn = ?
and address = ? and city = ? and state = ?
and zip = ?");
// fairly complex query
Applications should call getBestRowIndentifier() to retrieve the most optimal set of columns (possibly a pseudo-column) that identifies any given record. Many databases support special columns that are not explicitly defined by the user in the table definition but are "hidden" columns of every table (for example, ROWID and TID). These pseudo-columns generally provide the fastest access to the data because they are pointers to the exact location of the record. But because pseudo-columns are not part of the explicit table definition, they are not returned from getColumns. The only method of determining if pseudo-columns exist is to call getBestRowIndentifier().

Consider the previous example again:

..
ResultSet WSrowid = getBestRowIndentifier()
(.... "emp", ...);
..
WSs.executeQuery ("UPDATE EMP SET ADDRESS = ?
WHERE first_name = ? and last_name = ? and ssn = ?
and address = ? and city = ? and state = ?
and zip = ?");
// fastest access to the data!
If your data source does not contain special pseudo-columns, then the result set of getBestRowIndentifier() consists of the columns of the most optimal unique index on the specified table (if a unique index exists). Therefore, your application does not need to call getIndexInfo to find the smallest unique index.

Conclusion
With thoughtful design and implementation, you can improve the performance of your JDBC applications. By following the suggested guidelines in this article, your applications can run more efficiently and generate less network traffic. Review your JDBC applications to find hidden treasure that will boost performance and reduce network traffic.

Author Bio
John Goodson is the vice president of research and development for MERANT Data Direct. For nearly ten years, John has worked closely with Sun and Microsoft on the development of database access standards, and is an active member of the Expert Panel for the JDBC specification evolution. John holds a BS in computer science from Virginia Tech.
[email protected]

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.