HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML
 

Java Techniques: Encoded Streams, by Mike Jasnowski

Two basic types of data - test and binary - are used in applications to create files such as documents, images, video, text and executables. Certain applications, however, may need to alter a file to make it available to other applications; for example, e-mail requires text and binary data to be encoded before it's sent.

This article discusses a technique used to read and write encoded data using Java I/O streams. We'll define encoding and cover some of its history, examine two I/O stream classes and an interface, then finish by applying this technique to both a text and a binary file using the Base64 encoding scheme. With this technique you can provide encoding in your applications as well as encoded user information for authenticating against HTTP servers. This technique is provided using a standard, familiar group of Java classes: the I/O streams.

What Is Encoding?
Encoding manipulates and reorganizes bytes so they can be understood by other applications (see Figure 1). This is done primarily for Internet e-mail systems, but is also used in places like basic authentication. Basic authentication requires the user ID and password to be encoded using Base64. Although encoding has been around awhile, you probably never knew it. For example, your e-mail and attachments could be encoded before being sent and decoded when received. E-mails specify encoded content by using the Content-Transfer-Encoding header. This header field can have the following values:

  • 7Bit
  • Quoted-Printable
  • Base64
  • 8Bit
  • Binary

One side effect of encoding is a possible increase in the size of your data. It all depends on the encoding scheme you're using.

Now that we have some basics, let's look at the EncodedInputStream and EncodedOutputStream classes, which are used to read and write encoded data.

EncodedInputStream
The EncodedInputStream takes encoded data and give it back as a byte array. Convert this data to any form you wish, such as text (see Listing 1). Its constructor takes two arguments: InputStream and EncodingScheme. The InputStream course could be a FileInputStream or even a socket.

Base64EncodingScheme scheme = new Base64EncodingScheme();
EncodedInputStream eIn = new EncodedInputStream(new FileInputStream("encoded.txt"),scheme);
Byte data[] = eIn.readEncoded();
This class overrides the read method and adds a method called readEncoded, which reads encoded data and returns it as a byte array. The read method has been overridden to always return a -1. Initially this was done because the read method returns single bytes; when decoding data, you may be working with more than a single byte at a time.

EncodedOutputStream
The EncodedOutputStream writes out data using whatever encoding scheme you specify (see Listing 2). Its constructor takes two arguments: InputStream and EncodingScheme. The OutputStream can be almost any kind of stream, such as a FileOutputStream or a socket.

Base64EncodingScheme scheme = new Base64EncodingScheme();
EncodedOutputStream eOut = new EncodedOutputStream(new FileOutputStream("encoded.txt"),scheme);
eOut.write("This is unencoded data".getBytes());
This class will buffer output as it's written to the class, encode the data, then write it out to the actual OutputStream specified in the constructor. Use it as you would any other I/O stream - just write either an integer or a byte array and the data will be encoded using the scheme you passed into the constructor.

EncodingScheme
Let's look at the EncodingScheme interface. It's a class that provides different encoding implementations such as the Base64 used in this article (see Listing 3). Its two methods are encode and decode. The EncodedInputStream and EncodedOutputStream delegate to this class when writing and reading the data. Rather than impose different encoding scheme implementations on a user of the stream, developers can plug in different encoding schemes (Quoted-Printable, 7Bit and Base64) and use familiar methods to read and write data without requiring significant changes to their code.

Base64 Encoding Scheme
Before moving to our sample application, we need to implement an encoding scheme; I'll show the Base64 encoding scheme. This scheme basically reorganizes three 8-bit chunks into four 6-bit chunks (see Figure 2). These four 6-bit chunks are represented using a special NVT ASCII character set. The "=" sign is used to pad chunks that aren't a multiple of 3 bytes. You must also organize encoded data into chunks no greater than 76 bytes each. A more formal explanation is available in RFC 2045. As noted previously, encoding increases the size of your data. Base64 increases the size by approximately one-third.

Figure 2: Three 8-bit chunks reorganized as four 6-bit chunks

The basic flow of the encode method is to work with 3 byte chunks at all times. When you reach the end of your data, pad with the "=" character. After each iteration of the loop, 4 bytes will be written out to the buffer. When the loop has completely passed through all the data, padding is added and the encoded byte array is returned. The decode method operates almost the same except it works with 4 byte chunks instead of 3 and ignores the padding character (see Listing 4).

Sample Application
Let's put our encoding scheme to use. Our first example encodes a Java source file, then decodes it (see Listing 5). Compile EncodingSample and then run it, specifying HelloWorld.java as the argument (see Listing 6). Once it's finished running, look at the contents of the encoded.txt file to see what the file looks like in its encoded state.

Now take the HelloWorld Java class file, encode it and then decode it. If you haven't already done so, compile the HelloWorld.java file and then run EncodingSample, specifying HelloWorld.class as the argument. Then look at encoded.txt file to see what the file looked like encoded. To prove the file was successfully decoded, type "java HelloWorld" - you should see "HelloWorld" printed out.

Enhancements
While EncodedInputStream and EncodedOutputStream allow you to easily read and write encoded data, some enhancements can be made. Buffering large datasets makes it easy to decode all at once but may cause intermittent OutOfMemoryErrors. Alternatively, data can be encoded and decoded in chunks rather than all at once. Due to time constraints I was unable to implement this feature.

Summary
It's easy to provide an extensible means to read and write encoded data using ordinary Java I/O streams. You can also provide your own EncodingScheme implementations and plug them into your code without changes. For all you sun.misc.BASE64Encoder users, you now have a documented way to use Base64 encoding. Good Luck!

Author Bio
Mike Jasnowski, a Sun-certified Java programmer, has over 17 years of programming experience and over three years with Java. He works for a software company in Kansas City, Missouri. He can be contacted at: [email protected]

	

Listing 1 

/* 
 * 
 *  EncodedInputStream 
 * 
 *  This class is used to decode a stream of data that has been encoded 
 * 
 * 
 * @author Mike Jasnowski 
*  @version 1.0 , 06/01/2000 
 */ 
import java.io.InputStream; 
import java.io.IOException; 
import java.io.ByteArrayOutputStream; 

public class EncodedInputStream extends InputStream{ 

  private EncodingScheme encoding_scheme; 
  private InputStream in_stream; 
  private ByteArrayOutputStream in = new ByteArrayOutputStream(); 
  

  public EncodedInputStream(InputStream in,EncodingScheme scheme){ 
        in_stream = in; 
        encoding_scheme = scheme; 
  } 

  public int read() throws IOException{ 
   int nill = -1; 
   return nill; 
  } 

  public byte[] readEncoded() throws IOException{ 

 int read = 0; 
        byte decoded[] = null; 
  
 while ((read = in_stream.read())!=-1) 
     in.write(read); 

 decoded = encoding_scheme.decode(in.toByteArray()); 
  
 return decoded; 
  } 

  public void close() throws IOException{ 
 super.close(); 
 in_stream.close(); 
  } 

}
 
Listing 2 

/* 
 * 
 *  EncodedOutputStream 
 * 
 *  This class is used to encode a stream of data 
 * 
 * @author Mike Jasnowski 
*  @version 1.0 , 06/01/2000 
 * 
 */ 

import java.io.OutputStream; 
import java.io.ByteArrayOutputStream; 
import java.io.IOException; 

public class EncodedOutputStream extends OutputStream{ 

 
  private OutputStream out_stream; 
  private ByteArrayOutputStream out = new ByteArrayOutputStream(); 
  private EncodingScheme encoding_scheme; 

  public EncodedOutputStream(OutputStream out,EncodingScheme scheme){ 
         out_stream = out; 
         encoding_scheme = scheme; 
  } 

  public void write(int b) throws IOException{ 
        /* Encoding needs to be done here before it's written to Outputstream */ 
 out.write(b); 
   } 

  public void write(byte[] b) throws IOException{ 
        write(b,0,b.length); 
  } 

  public void write(byte[] b,int offset,int length) throws IOException{ 
        for (int i = 0;i < length;i++) 
            write(b[offset + i]); 
  } 

  public void close() throws IOException{ 
      super.close(); 
      out_stream.write(encoding_scheme.encode(out.toByteArray())); 
      out_stream.close(); 
  } 
}
 
Listing 3 

/* 
 * 
 * 
 *  EncodingScheme - The interface class for all EncodingSchemes 
 * 
 * @author Mike Jasnowski 
*  @version 1.0 , 06/01/2000 
 */ 

public interface EncodingScheme{ 

  /* This method is called by EncodedOutputStream  */ 

  public byte[] encode(byte[] to_encode); 

  /* This method is called by EncodedInputStream */ 

  public byte[] decode(byte[] to_decode); 

} 

Listing 4 

/* 
 * 
 * This class carries the encode/decode logic for the scheme 
 * 
 * @author Mike Jasnowski 
*  @version 1.0 , 06/01/2000 
 */ 

import java.io.ByteArrayOutputStream; 

public class Base64EncodingScheme implements EncodingScheme{ 
  

  char NVT_ASCII[] = {'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U', 
'V','W','X','Y','Z','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p'
,'q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9','+','/'}; 

  private boolean isPadding = false; 
  
  public Base64EncodingScheme(){} 

  public byte[] encode(byte[] data){ 
  
  char temp[] = new char[4]; 
  int byte1=0,byte2=0,byte3=0,byte4=0; 
         int num_padding = 3; 
  byte[] hold_buffer = null; 
  String encoded = ""; 
  int chunk = 0; 
  ByteArrayOutputStream out = new ByteArrayOutputStream(); 
  int counter = 0; 

  for (int i = 0;i < data.length;i+=3){ 
   hold_buffer = new byte[3]; 
   temp = new char[4]; 
   num_padding=3; 
   if (i < data.length){ hold_buffer[0] = data[i];num_padding--;} 
   if (i+1 < data.length){ hold_buffer[1] = data[i+1];num_padding--;} 
   if (i+2 < data.length){ hold_buffer[2] = data[i+2];num_padding--;} 

   /* This puts padding in place */ 
   if ((i+2 > data.length) && (hold_buffer[1] == 0)) {hold_buffer[1] = 61;isPadding=true;} 
   if ((i+3 > data.length) && (hold_buffer[2] == 0)) {hold_buffer[2] = 61;isPadding=true;} 

   /* Encoding Starts Here */ 
   byte1 = unsigned(hold_buffer[0]) >>> 2; 
   byte1 = byte1 & 0xFF; 

   temp[0] = NVT_ASCII[byte1];    /* first byte encoded */ 

          byte1 = unsigned(hold_buffer[0]) << 6; 
          byte1 = byte1 & 0xFF; 
          byte1 = byte1 >>> 2; 
   if (hold_buffer[1] != 0x3D | !isPadding) 
           byte2 = unsigned(hold_buffer[1]) >>> 4; 
   else 
    byte2 = 0; 
          byte3 = byte1 | byte2; 
  
          temp[1] = NVT_ASCII[byte3];    /* second byte encoded */ 

   if (hold_buffer[1] != 0x3D | !isPadding){ 
  
   byte1 = unsigned(hold_buffer[1]) << 4; 
   byte1 = byte1 & 0xFF; 
          byte1 = byte1 >>> 2; 
          byte2 = unsigned(hold_buffer[2]) >>> 6; 
          byte3 = byte1 | byte2; 

          temp[2] = NVT_ASCII[byte3];    /* third byte encoded */ 

   } 
  
   if (hold_buffer[2] != 0x3D | !isPadding){ 
   byte1 = unsigned(hold_buffer[2]) << 2; 
   byte1 = byte1 & 0xFF; 
          byte1 = byte1 >> 2; 
          temp[3]= NVT_ASCII[byte1];    /* fourth byte encoded */ 

   } 
  
   counter = 0; 

   for (int j = 0;j<temp.length;j++){ 
  
    if (temp[j] != 0){ 
         counter++; 
         out.write((byte)temp[j]); 
    } 
   } 

   chunk+=4; 
  
   if (chunk == 76){ 
    out.write(13); 
    out.write(10); 
    chunk=0; 
   } 
  
  } 

  /*Write out padding */ 
  for (int j = 0;j<num_padding;j++) 
   out.write(61); 

  /* Write final CRLF */ 
  out.write(13); 
  out.write(10); 

  return out.toByteArray(); 

  } 
  

  public byte[] decode(byte[] data){ 

  byte decoded[] = new byte[3]; 
  byte hold_buffer[] = new byte[4]; 
  int byte1=0,byte2=0,byte3=0; 
  int running_length = 0; 
  ByteArrayOutputStream out = new ByteArrayOutputStream(); 
  ByteArrayOutputStream temp = new ByteArrayOutputStream(); 
  int remove_padding = 0; 

  /* Strip out CRLF - Chunk markers */ 
  for (int c = 0;c<data.length;c++){ 
   if (data[c] != 0x0D && data[c] != 0x0A) 
      temp.write(data[c]); 
  } 

  byte newdata[] = temp.toByteArray(); 
  
  running_length = newdata.length; 
  
  for (int i = 0;i < running_length;i+=4){ 
  
       hold_buffer = new byte[4]; 
       decoded = new byte[3]; 
       remove_padding = 0; 
  
       hold_buffer[0] = newdata[i]; 
       hold_buffer[1] = newdata[i+1]; 
       hold_buffer[2] = newdata[i+2]; 
       hold_buffer[3] = newdata[i+3]; 

       if (hold_buffer[2] == 61) remove_padding++; 
       if (hold_buffer[3] == 61) remove_padding++; 
  
       byte1 = nvt_lookup((char)hold_buffer[0]) << 2; 
       byte2 = nvt_lookup((char)hold_buffer[1]) >> 4; 
       byte3 = byte1 | byte2; 
  
       decoded[0] = (byte)byte3; /* First Byte Decoded */ 

       byte1 = nvt_lookup((char)hold_buffer[1]) << 4; 
       byte2 = nvt_lookup((char)hold_buffer[2]) >> 2; 
       byte3 = byte1 | byte2; 

       decoded[1] = (byte)byte3;  /* Second Byte Decoded */ 
  
       byte1 = nvt_lookup((char)hold_buffer[2]) << 6; 
         byte2 = nvt_lookup((char)hold_buffer[3]) ; 
       byte3 = byte1 | byte2; 

       decoded[2] = (byte)byte3; /* Third Byte Decoded */ 
  
       out.write(decoded,0,decoded.length-remove_padding); 

  } 
  
  return out.toByteArray(); 
  } 

private int nvt_lookup(char c){ 
 for (int i = 0;i<NVT_ASCII.length;i++){ 
  if (c == NVT_ASCII[i]) 
   return i; 
 } 
 return 0; 
} 

private int unsigned(int value){ 
          int newvalue = value << 24; 
          return newvalue >>> 24; 
 } 

} 

Listing 5 

/* 
 * 
 *  This sample performs the following: 
 * 
 *  1) Encodes some sample text using the "EncodedOutputStream" 
 *  2) Displays the encoded text from a file 
 *  3) Decodes the  sample text by reading in the file using 
 *     the "EncodedInputStream" 
 * 
 * @author Mike Jasnowski 
 * @version 1.0 , 06/01/2000 
 */ 

import java.io.*; 

public class EncodingSample{ 
  

  public static void main(String args[]){ 
    new EncodingSample(args[0]); 
  } 
  

  public EncodingSample(String filename){ 

    Base64EncodingScheme scheme = new Base64EncodingScheme(); 

    try { 

   /* Write it out and encode */ 

          EncodedOutputStream encOut = new EncodedOutputStream(new FileOutputStream("encoded.txt"),scheme); 
   ByteArrayOutputStream buffer = new ByteArrayOutputStream(); 
   FileInputStream fin = new FileInputStream(filename); 

   int ch = 0; 
   while ((ch = fin.read())!=-1) 
  buffer.write(ch); 

          byte edata[] = buffer.toByteArray(); 

          encOut.write(edata); 

          encOut.close(); 

   System.out.println(filename + " has been encoded"); 

   /* Read it back in and decode */ 
  
   EncodedInputStream encIn = new EncodedInputStream(new FileInputStream("encoded.txt"),scheme); 
   int r = 0; 
   byte ddata[] = encIn.readEncoded(); 
   encIn.close(); 

   FileOutputStream fout = new FileOutputStream(filename); 
   fout.write(ddata,0,ddata.length); 
   fout.close(); 

    System.out.println(filename + " has been decoded"); 

    }catch(IOException e){ 
          System.out.println(e); 
    } 

  } 
}
 
Listing 6 

public class HelloWorld{ 

 public static void main(String args[]){ 
  new HelloWorld(); 
 } 

 public HelloWorld(){ 
  System.out.println("HelloWorld"); 
 } 

} 

  
 
 

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.