HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML
 

Information repositories are essential. They allow data to be shared within or outside an organization, bringing us closer to the reality of the paperless office.

With the toolset shown in Table 1, you can build an enterprise-class, scalable Web-enabled repository that fully incorporates various forms of media. Document files, photographs, video clips and sound files can easily be included in the repository using Java and Oracle8i's LOB (Large Objects) data types.

Table 1

In this article I'm going to focus on how you build a repository to store and search documents such as Microsoft Word, and HTML and XML files stored in a LOB column of a database table. The example used here populates the repository with Microsoft Word résumés, indexes it using Oracle Intermedia and reads it using Java streams from a servlet (see Figure 1).

Figure 1
Figure 1:

Benefits of Java and Oracle8i
Building repositories using Java and Oracle8i has several benefits. The documents inherently take advantage of the transaction management and ACID (atomicity, concurrency, integrity and durability) properties of the relational database, which means that changes to an internal LOB can be committed or rolled back. Moreover, associated applications can seamlessly take advantage of database features such as backup and recovery. This makes things easier for the system administrators, who no longer have to perform separate database and file system backups for relational information and documents. All data housed in the database, whether structured (relational) or unstructured (document files), can be written, searched and accessed using a single industry standard interface ­ SQL. These SQL statements can be executed from Java using JDBC (Java Database Connectivity).

Working with Large Objects
Oracle8i supports several types of LOB columns. One type, a BLOB (Binary Large Object), can house binary information such as audio, video, images and documents internally within the database. Each row can store up to 4 gigabytes of data. I used the BLOB data type to store the Microsoft Word résumés in my example.

The Oracle database stores a locator inline with the data. The locator is a pointer to the actual location of the data (LOB value). The LOB data can be stored in the same table or a separate one. The advantage of the locator is that the database doesn't have to scan the LOB data each time it reads multiple rows because only the LOB's locator value is read; the actual LOB data is read only when required.

When working with Java and LOBs, first execute the SELECT statement to get the LOB locator, then read or write LOBs using JDBC. (Oracle JDBC driver's type extension classes from oracle.sql package is used to read and write from an Oracle database.) The actual LOB data is materialized as a Java stream from the database, with the locator representing the data in the table. The following code reads the résumé of an employee whose employee number, 7900, is stored in a LOB column called résumé in the sam_emp table.

Statement st = cn.createStatement();
ResultSet rs = st.executeQuery
("Select resume from sam_emp where empno=7900");
rs.next();
oracle.sql.BLOB blob=((OracleResultSet)rs).getBLOB(1);
InputStream is=blob.getBinary-
Stream();

Populating the Repository
The documents can be written to LOB columns using Java, PL/SQL or a bulk utility called Oracle SQL*Loader. To insert a new row, do the following:

  1. Execute the SQL insert statement with an empty BLOB.
  2. Query the same row to get the locator object. Use this locator to write your document to the LOB column. Note: Java streams are employed to write the documents to the LOB column.
  3. Create the Java output stream using the getBinaryOutputStream() method of this object to write your document or any binary information to that column.
For example, to insert information in the sam_emp table about a new employee whose employee number is 9001, first insert all the structured information along with an empty BLOB using JDBC. Next, select the LOB column, résumé, from the same row to get the oracle.sql.BLOB object (the locator). Finally, create the Java output stream from this object:

st.execute("INSERT INTO sam_emp(empno, resume)
VALUES(9001,empty_blob())");
ResultSet rs = st.executeQuery(
"Select résumé from sam_emp where empno=9001 for update");
rs.next();
oracle.sql.BLOB blob = ((OracleResultSet)rs).getBLOB(1);
OutputStream os = blob.getBinaryOutputStream();

Optionally, you may use the java.awt.FileDialog class and java.io package to dynamically select and read a file from your PC. Then load it to a LOB column using the preceding code.

The way you search and retrieve documents is independent of how you load the documents. For example, you can store the documents using PL/SQL or SQL*Loader, then search and retrieve using Java servlets. Using PL/SQL, Listing 1 loads an employee's résumé, saved as a Microsoft Word file, to the résumé column of the sam_emp table.

Searching the Repository
The documents stored in the LOB columns can be indexed using Oracle Intermedia, which provides advance search capabilities such as fuzzy, stemming, proxy, phrases and more. It can also generate thematic searches and gist. The documents can be indexed via the "Create Index" database command.

Refer to Listing 2 to see how the index is built on the résumé column of the sam_emp table. Once the index is created, the Java applications can search the repository by simply submitting SELECT statements.

The MyServletCtx servlet in Listing 3 searches the term passed to it as a parameter in the résumé column of the sam_emp table. The servlet returns the rows matching the search criteria in HTML table format. The employee names in the HTML table are hyperlinked to another servlet, MyServlet, which reads the entire résumé from the database in its original format.

Retrieving from the Repository
Document retrieval using Java is similar to writing documents to the repository. The "Working with Large Objects" section earlier in this article describes how to read LOBs from the database. The MyServlet in Listing 4 reads a Microsoft Word résumé from the sam_emp table, sets the content type, then streams it out to the browser using an output stream.

Summary
In this article I've shown how you store, search and retrieve Word documents using LOB data types and Java.

You can also use the Oracle8i database to store, index, parse and transform XML documents. Storing XML documents in the database removes the need to administer and manage multiple repositories for relational and XML data. The Oracle8i's JVM makes it possible to run a Java XML parser in the database. Using the parser, you can parse and transform the XML files inside the database before outputting it to an application server.

Author Bio
Samir Shah, a former Oracle employee and a certified Oracle professional, is a manager of database and Web technologies at Wall Street Systems. He has over 10 years' industry experience.
[email protected]

	

Listing 1: Inserting Word document in BLOB column using PL/SQL 

 The following code (Steps 2­5) inserts MyResume.doc in the résumé column of sam_emp table. 

Step 1: Create a directory object in Oracle. 
 Herešs how to create a directory object called MY_FILES that represents C:\MY_DATA
  directory. 
 You must have the create directory privilege in Oracle. 

 create or replace directory 
  MY_FILES as 'C:\MY_DATA'; 

Step 2: Insert a row with an empty BLOB in your table and return the   locater. 
Step 3: Point to the Word file to be loaded from the directory created   in Step 1 using bfile data type. 
Step 4: Open the file and use the locater from Step 2 to insert the file. 
Step 5: Close the file and commit the transaction. 

declare 
    f_lob   bfile; 
    b_lob   blob; 

begin 

    insert into sam_emp(empno,ename,resume) 
 values ( 9001, 'Samir',empty_blob() ) 
 return résumé into b_lob; 

    f_lob := bfilename( 'MY_FILES', 'MyResume.doc' ); 
    dbms_lob.fileopen(f_lob, dbms_lob.file_readonly); 
    dbms_lob.loadfromfile 
 ( b_lob, f_lob, dbms_lob.getlength(f_lob) ); 
    dbms_lob.fileclose(f_lob); 

    commit; 

end; 
/ 

Listing 2: Building index to search documents 

   The steps listed below index all the Microsoft Word-formatted résumés stored in the
    résumé column to the sam_emp table. The résumés can then be searched using SQL. 
Step 1: Add a primary key to your table if it does not exist. To make empno primary key of
 the sam_emp table execute the following command. 

 alter table sam_emp add constraint 
 pk_sam_emp primary key(empno); 

Step 2: Get the privileges (ctxapp role) to create text indexes from   administrators. 
Step 3: Create the index with the appropriate filter object. Filters 
 determine how to extract text for document indexing from the 
 word processor, formatted documents as well as plain text. 
 See Oracle8i intermedia Text for complete list of filters. 

 create index ctx_doc_idx on sam_emp(résumé) 
 indextype is ctxsys.context parameters 
 ('filter CTXSYS.INSO_FILTER'); 

Listing 3: Searching documents using SQL and JDBC 

package package1; 
import javax.servlet.*; 
import javax.servlet.http.*; 
import java.io.*; 
import java.sql.*; 

/** 
 This servlet searches documents stored in an Oracle8i database repository using SQL and
  JDBC. The hit list is displayed in an HTML table with hyperlinks. JDK 1.1.7 and an Oracle
   Thin JDBC 1.22 compliant driver used. 

* 
* @author Samir Shah 
* @version 1.0 
**/ 
public class MyServletCtx extends HttpServlet{ 
   Connection cn; 

   public void init(ServletConfig parm1) 
  throws ServletException { 
    super.init( parm1); 
    try{ 
    DriverManager.registerDriver( 
 (new oracle.jdbc.driver.OracleDriver())); 
    cn =DriverManager.getConnection 
 ("jdbc:oracle:thin:@sshah:1521:o8i", 
          "scott", "tiger"); 
    } 
    catch (SQLException se){se.printStackTrace();} 
  } 

  public void doGet(HttpServletRequest req, 
     HttpServletResponse res) throws IOException{ 

      doPost(req,res); 
  } 

  public void doPost(HttpServletRequest req, 
    HttpServletResponse res) throws IOException{ 

    PrintWriter out = res.getWriter(); 
    res.setContentType("text/html"); 

    //The term to search in resume column 
    String term = req.getParameter("term"); 
    if (term == null) 
       term="security"; 

    out.print("<html>"); 
    out.print("<body>"); 
    out.print("<H1>Search Result</H1>"); 
    out.print("<table border=1 bgcolor=lightblue>"); 
    out.print("<tr><th>ID#</th><th>Name</th></tr>"); 
    out.print("<tr>"); 
    try{ 
     Statement st = cn.createStatement(); 

     //search the term in resume column using SQL 
     String query = 
         "Select empno,ename from sam_emp" + 
         " where contains(resume,'"+term+"')>0"; 

     ResultSet rs = st.executeQuery(query); 

      while (rs.next()){ 
        out.print("<td>"+ rs.getInt(1)+"</td>"); 
        out.print("<td>" + 
          "<A HREF=http://sshah:8080/" + 
          "servlet/MyServlet?term=" + 
          rs.getString(1) + 
          " target=Document>" + 
          rs.getString(2) + 
          "</A></td>"); 
        out.print("</tr>"); 
      } 

      out.print("</table>"); 
      out.print("</body>"); 
      out.print("</html>"); 
     }//try 
     catch (SQLException se){se.printStackTrace();} 

  } 
} 

Listing 4 

package package1; 

import javax.servlet.*; 
import javax.servlet.http.*; 
import java.sql.*; 
import java.io.*; 
import oracle.jdbc.driver.*; 
import oracle.sql.*;  //for oracle.sql.BLOB 

/** 

 This class reads the entire document from the résumé LOB column. It takes one parameter,
  term, to search a specific employee from the sam_emp table and returns the document
   stored in that row. 

* JDK 1.1.7, Oracle Thin JDBC 1.22 compliant driver 
* with Oracle type extension classes (oracle.sql) 
* 
* @author Samir Shah 
* @version 1.0 
**/ 
public class MyServlet extends HttpServlet{ 
  Connection cn; 

  public void doGet(HttpServletRequest req, 
  HttpServletResponse res) 
  { 
   try{ 
    doPost(req,res); 
   }catch (IOException ie){ie.printStackTrace();} 
  } 

  public void init(ServletConfig parm1) 
 throws ServletException 
  { 

    super.init( parm1); 
    try{ 
    DriverManager.registerDriver( 
 (new oracle.jdbc.driver.OracleDriver())); 
    cn =DriverManager.getConnection( 
 "jdbc:oracle:thin:@sshah:1521:o8i", 
        "scott", "tiger"); 
    } 
    catch (SQLException se){se.printStackTrace();} 
  } 

  public void doPost(HttpServletRequest req, 
    HttpServletResponse res) throws IOException 
  { 
     InputStream is=null; 
     oracle.sql.BLOB blob=null; 

     res.setContentType("application/msword"); 
     OutputStream os = res.getOutputStream(); 
     String term = req.getParameter("term"); 

     if (term==null) 
        term="9001"; 

     try{ 
     Statement st = cn.createStatement(); 
     ResultSet rs = st.executeQuery 
          ("Select resume from sam_emp"+ 
           " where empno="+term); 
  
      while (rs.next()){ 
          blob=((OracleResultSet)rs).getBLOB(1); 
          is=blob.getBinaryStream(); 
      } 

      int pos=0; 
      int length=0; 
      byte[] b = new byte[blob.getChunkSize()]; 

      while((length=is.read(b))!= -1){ 
       pos+=length; 
       os.write(b); 
      } 
     }//try 
     catch (SQLException se) 
 { 
   se.printStackTrace(); 
        } 
     finally { 
                is.close(); 
     } 

  } 

} 




 

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.