HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML
 

Last month, we began looking at building data structures in Java. The idea for the article was inspired by the constant posts to comp.lang.java from people who were lost without pointers. The data structures which we introduced were useful, but were really more of a starting point for some more advanced data structures. The third data structure covered last month was a binary search tree. As you will remember, binary search trees are great for storing large amounts of information, as search time is much faster than for a linked-list. The problem with binary trees, however, is the worst case scenario. When the input stream is already in order (reverse or forward), then our binary tree becomes a simple linked list. This, of course, can bring our worst search time from O(lgN) all the way down to O(N) which is no good when you have to search large amounts of data.

The term applied to trees which have become weighted to one side is unbalanced. Figure 1 shows an example of a very unbalanced tree, the result of an input stream of 1-2-3-4-5-6-7. A solution to our problem is obvious; we must make sure that we do not allow our trees to become unbalanced. This, however, is easier said than done. It obviously requires manipulation of a variety of links and if the utmost attention is not spent implementing your algorithms then you can end up with a terrible mess of nodes which will be no fun to untangle.

Figure 1
Figure 1

Various computer scientists have come up with balanced tree algorithms. One of the more popular implementations is the Red-Black tree implementation. These algorithms are actually based on another, more complicated, type of balanced tree: 2-3-4 trees. Before I confuse anyone, let's fall into some theory. If you are already familiar with these topics then feel free to jump to the next section, where we begin our pointer-free implementations. If you are unfamiliar with these topics or if you are just a little rusty, then you will have a full understanding in a few minutes.

Since Red-Black balanced trees are based on 2-3-4 balanced trees we will start our discussion there. Then we will discuss their disadvantages, and finally we will discuss Red-Black balanced trees.

One of the main constraints of binary search trees is the fact that they can only connect to two unique nodes. 2-3-4 balanced trees - as the name implies - get around this by allowing each node to link to 2, 3, or 4 unique nodes. Additionally, each node is allowed to have 1, 2, or 3 keys. Figure 2 shows our tree from Figure 1 implemented as a 2-3-4 balanced tree. See how much better everything is now.

Figure 2
Figure 2

If you were paying attention to some of my earlier comments, then you will remember when I stated that balanced trees were easy to discuss in theory, but were difficult to implement. This also applies to 2-3-4 trees. It is not hard to see that the methods used to manipulate these trees would be a mess of code that only the hard-core geeks could enjoy. A happy medium of binary search trees and 2-3-4 balanced trees has been developed, and this is what we will implement in this month's column.

The "happy medium" comes in the form of something called a Red-Black balanced tree, which is basically a binary representation of a 2-3-4 balanced tree. The caveat here is that the node class uses an extra field to hold a color variable, and we represent different node structures (2, 3, or 4) by altering each node's color variable. Take a look at Figure 3. Here I show how we will represent 3 and 4-nodes by altering the color variable of different binary nodes. Figure 4 shows the tree from Figures 1 and 2 represented as a Red-Black balanced tree.

Figure 3
Figure 3
Figure 4
Figure 4

There are a few rules which Red-Black trees must conform to, and we will institute various checks throughout our methods to insure that these rules are conformed to at all times. The rules are:

  1. Every node must be either red or black.
  2. Every leaf must be black.
  3. If a node is red, then both of its children must be black. This of course implies that we can not have two red links in a row.
  4. Every path from a node to a descendant leaf contains the same number of black nodes.
Let's take a break from theory now, and begin coding. If you are still a little unclear as to just how we will insure that the rules will be adhered to, don't fret; we will build two methods (rotate and split) which will take care of everything for us. First, let's build an object which will represent one node. The object will need to have fields for the key (for simplicity we are using ints), and for the color. To save space we are using just one byte for the color, which will take a value of 1 for red or 0 for black. Additionally, we will need to create links to the left, right, and parent nodes. Listing 1 shows the node class. The code is basic, and can be easily understood from the in line comments. It should be noted that all of the get methods are designed to throw an exception if they are asked for a null node. This exception is defined in Listing 2.

As the first comment stated, RBNode will never be manipulated directly by the user. All interaction with the tree will be handled by creating a new instance of the RBTree class (Listing 3). RBTree implements two methods, rotate and split, which manage the balancing of our tree. Let's first take a look at how they would deal with the insertion of a 11 into a Red-Black balanced tree which already contains a few nodes. Once I demonstrate what they do, I will discuss how they do it, and why this is necessary.

Note: If you feel that unnecessary time is spent transferring extra objects through the methods (grand, great, curr, child, etc...) then by all means rewrite the code. I simply felt that the code was much easier to understand this way. If anyone does implement the algorithm without the object passing then please do some benchmarks and let me know which is faster; I am always eager to see my code improved upon. Plus, you will get your results published in next month's edition!

Figure 5 shows a Red-Black balanced tree waiting for insertion of the number 11. As soon as insert attaches the new node containing 11 to the end of the tree (figure 6), it calls split and passes references to the parent, grand, and great objects.

Figure 5
Figure 5
Figure 6
Figure 6

It then checks to see if the parent is a red node; if it is, one to two rotations are preformed on the links. Having to preform two rotations is obviously not something that we want to do too often. Sacrificing too much speed on an insert is not something that we want regardless of how fast a search is. Red-Black trees are great for storing a dynamic data set. If it takes too long to add to that data set then we will not be too pleased. Split by itself does not really do too much work; rotate does the meat of the reorganizing. For our example, only the second call to rotate will happen. It will be passed by the int which has just been inserted, and also the great (great-grandparent) node. It should be noted that cases where two rotations are required are not likely to occur often.

Rotate now uses the key passed into it to "rediscover" the locations of great's child and grandchild. The rotate method wraps itself up by preforming three "link restructures".

As I feel closely tied to the idiom that ends "...show me and I understand", I hope that you now have a detailed understanding of how Red-Black trees stay balanced. Before I leave you with the final code listing I want to stress a few points regarding the function of our methods. First, it is important to note that 4-nodes are something which we try to keep to a minimum. If you take a look at the snippet from our insert method, you will notice that as we perform each insert, any 4-nodes which are encountered are immediately broken up:

//is the current node a "4 node"
try {
if ( "red".equals( currNode.getLeft().getColor() ) &&
"red".equals( currNode.getRight().getColor() ))
{ split( theInt, currNode, parent, grand, great ); }
}
catch ( NoSuchNodeException nsne) { }

Figure 7
Figure 7

Also, you should note the standard points where the node structure is manipulated. These points allow us to follow the set up rules for Red-Black trees which we defined earlier. The points, with their associated action are:

  • A 3-Node connected to a 4-Node should be translated into a 4-Node connected to two 2-Nodes.
  • A 2-Node connected to a 4-Node should be translated into a 3-Node connected to two 2-Nodes.
Related material and references:
Algorithms in C++ by Robert Sedewick, 1992 Addison Wesley Introduction to Algorithms by Thomas H. Corman, Charles E. Leiserson, Ronald L. Riverst, 1990 MIT, McGraw-Hill The Java Programming Language by Ken Arnold and James Gosling, 1996 Sun Microsystems, Addison Wesley

About the Author
Luke Cassady-Dorion heads up Java Development for Odyssey Systems Corporation (www.iliad.com) in Manayunk, PA. He is the Mac editor of Java Developer's Journal.

Listing 1: The node class

//RBNode is a node of our tree. This class will
//never be manipulated directly as the user will
//deal with RBTree which in turn manipulates many
//instances of RBNode.
class RBNode {

private int key;
private byte color;

private RBNode left;
private RBNode right;
public RBNode parent;

//constructors:

//used when we are creating "dummy" nodes
public RBNode() {
this.left = null;
this.right = null;
this.parent = null;
}

//creates a new node in the tree with the values passed in.
public RBNode( int theKey, String theColor, RBNode p ) {
this.setKey( theKey );
this.setColor( theColor );
this.left = null;
this.right = null;
this.parent = p;
}

//accepts an int and places it in the key variable.
//I chose not to do any error
//checking here, but this was done mainly for brevity.
public void setKey( int k ) {
this.key = k;
}

//return the current key
public int getKey() {
return this.key;
}

//accepts either "red" or "black" as Strings and
//sets the color bit accordingly
public void setColor( String c ) {
if ( "red".equals( c ))
this.color = 1;
else if ( "black".equals( c ))
this.color = 0;
}

//return the color as a string
public String getColor() {
if ( this.color == 1 ) { return "red"; }
return "black";
}

//returns the right node
public RBNode getRight() throws NoSuchNodeException {
//if there is no right node, we throw an exception
if ( this.right == null )
throw new NoSuchNodeException( "right", this );

//otherwise we return the right node
return this.right;
}

//returns the left node
public RBNode getLeft() throws NoSuchNodeException{
//if there is no left node, we throw an exception
if ( this.left == null )
throw new NoSuchNodeException( "left", this );

//otherwise we return the left node
return this.left;
}

//return the parent node
public RBNode getParent() throws NoSuchNodeException {
//test for a null parent; for example the root of the tree
if ( this.parent == null )
throw new NoSuchNodeException( "parent", this );

//return the parent if no exception is thrown
return this.parent;
}

//sets the left node to newNode
public void setLeft( RBNode newNode ) {
this.left = newNode;
}

//sets the right node to newNode
public void setRight( RBNode newNode ) {
this.right = newNode;
}

//sets the parent node to newNode
public void setParent( RBNode newNode ) {
this.parent = newNode;
}
}

Listing 2: Defining the Exception

public class NoSuchNodeException extends Exception {
public String attrName;
public Object newValue;

NoSuchNodeException( String name, Object value ) {
super( "The node named \"" + name + "\" DNE" );
attrName = name;
newValue = value;
}
}

Listing 3: An RBTree

//RBTree represents the tree and the methods required to manipulate it.
class RBTree {
//the root of our tree
public RBNode theRBTree;

//used in the rotate and split methods to manage links
//grandchild node
private RBNode grandchild = null;
//child node
private RBNode child = null;
//current node
private RBNode currNode = null;
//parent node
private RBNode parent = null;
//grandparent node
private RBNode grand = null;
//great-grandparent node
private RBNode great = null;

//constructor used when the tree is created with an
//initial value
public RBTree ( int firstInt ) {
grandchild = null;
child = null;
currNode = null;
parent = null;
grand = null;
great = null;
theRBTree = new RBNode( firstInt, "red", null );
}

//constructor used when the tree is not created with an
//initial value
public RBTree() {
theRBTree = null;
grandchild = null;
child = null;
currNode = null;
parent = null;
grand = null;
great = null;
}

//the insert method which is passed an int
public void insert( int theInt ) {
this.insert( theInt, theRBTree, theRBTree, theRBTree, theRBTree);
}

//private insert method which recursively
//calls itself in order to insert the node
//at the correct spot
private void insert( int theInt, RBNode currNode, RBNode parent, RBNode
grand, RBNode great ) {
//find and insert the current int into the tree
//is this the first node being inserted
if ( currNode == null ) {
theRBTree = new RBNode( theInt, "red", null);
}
else {
//is the current node a "4 node"
try {
if ( "red".equals( currNode.getLeft().getColor() ) &&
"red".equals( currNode.getRight().getColor() ))
{ split( theInt, currNode, parent, grand, great ); }
}
catch ( NoSuchNodeException nsne) { }

//do we belong to the left of the current node?
if ( theInt < currNode.getKey() ) {
//attempt to call insert with the left node.
//If the left node is null, and exception
//will be thrown and we will not be able to insert the node.
try {
this.insert( theInt, currNode.getLeft(), currNode, parent,
grand ); }
catch ( NoSuchNodeException nsne ) {
//move parent, grand, great "up" one node
great = grand;
grand = parent;
parent = currNode;

currNode.setLeft( new RBNode( theInt, "red", currNode ));
//will never be thrown
try{ currNode = currNode.getLeft(); }
catch( NoSuchNodeException dummy ) { }

split( theInt, currNode, parent, grand, great );
try { currNode.getRight().setColor( "black" ); }
catch ( NoSuchNodeException nsne2 ) {}
}
}

//do we belong to the right of the current node?
else if ( theInt > currNode.getKey() ) {
try {
this.insert( theInt, currNode.getRight(), currNode, parent,grand ); }
catch ( NoSuchNodeException nsne ) {
great = grand;
grand = parent;
parent = currNode;

currNode.setRight( new RBNode( theInt, "red", parent ));
//will never be thrown
try{ currNode = currNode.getRight(); }
catch( NoSuchNodeException dummy ) { }

split( theInt, currNode, parent, grand, great );
try { currNode.getRight().setColor( "black" ); }
catch ( NoSuchNodeException nsne2 ) {}
}
}
}
}

private void split( int theInt, RBNode currNode,
RBNode parent, RBNode
grand, RBNode great ) {
//set new colors ... this defines the node structure
currNode.setColor( "red" );
try { currNode.getLeft().setColor( "black" ); }
catch ( NoSuchNodeException nsne ) {}
try { currNode.getRight().setColor( "black" ); }
catch ( NoSuchNodeException nsne ) {}

if ( "red".equals( parent.getColor() )) {
grand.setColor( "red" );
if ( (theInt < grand.getKey()) != (theInt < parent.getKey()) ) {
try{ parent = rotate( theInt, grand ); }
catch( NoSuchNodeException nsne ) { }
}

try{ currNode = rotate( theInt, great ); }
catch( NoSuchNodeException nsne ) { }

currNode.setColor( "black" );
}
}

private RBNode rotate( int theInt, RBNode theNode ) throws
NoSuchNodeException {
RBNode grandchild = null;
RBNode child = null;
try {
child = ( theInt < theNode.getKey() ) ? theNode.getLeft() :
theNode.getRight();
}
catch ( NoSuchNodeException nsne ) {
throw new NoSuchNodeException( "child", this);
}

if( theInt < child.getKey() ) {
try { grandchild = child.getLeft(); }
catch ( NoSuchNodeException nsne ) {
throw new NoSuchNodeException( "grandchild", this );
}
try { child.setLeft( grandchild.getRight() ); }
catch ( NoSuchNodeException nsne ) {
child.setLeft( null );
}
grandchild.setRight( child );
}
else {
try { grandchild = child.getRight(); }
catch ( NoSuchNodeException nsne ) {
grandchild = null;
throw new NoSuchNodeException( "grandchild", this );
}
try { child.setRight( grandchild.getLeft() ); }
catch ( NoSuchNodeException nsne ) {
child.setRight( null );
}
grandchild.setLeft( child );
}

if( theInt < theNode.getKey() ) {
//retain our parent links
grandchild.setParent( theNode );

theNode.setLeft( grandchild );

}
else {
grandchild.setParent( theNode );
theNode.setRight( grandchild );
}
return grandchild;
}
}

 

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.