HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML
 

Summary
An iterator is a language mechanism that facilitates successive enumeration of all the elements of a collection in some definite order. Java provides an iterator-like interface called Enumeration. The implementation model imposed by enumerations is known as cursor objects. It is not a simple task, however, to develop enumerations for non-linear data structures or for those with complex control in terms of cursors. This is because the enumeration procedure for a cursor is confined to use only simple control structures.

In this article, a new iterator mechanism is proposed for Java in which enumeration procedures can be described as if they were ordinary traversal procedures. It is fine to use arbitrary control structures such as recursive calls or infinite loops when developing enumeration procedures. Iterators are defined to implement the Enumeration interface and, thus, can be used in place of ordinary enumerations. The proposed iterator mechanism was implemented experimentally using threads and a zero-sized buffer.

Introduction
An abstract data type should provide enough operations so that everything users need to do with its objects can be done without inspecting the implementation details. Since it is a common use of a collection to perform some action for its elements, we need a systematic way to access all elements. This method should be convenient to use and should not destruct the collection.

Some programming languages, such as CLU[1], Sather[2] and Icon[3], provide a mechanism called iterator (also known as enumerator) to handle this situation. In these languages, an iterator is invoked like a procedure but, instead of terminating with a result, it has many results which it produces one at a time. The produced items can be used in other parts that specify actions to be performed for each item.

Iterators are more important in object-oriented programming than they were in other paradigms because programmers routinely construct abstract data types for collections in this paradigm. The author of the data abstraction must provide an iterator because the representation of the objects of the type is not known to the user.

Java [4] provides an iterator-like interface called Enumeration (java.util.Enumeration). It is used by many classes in the standard Java libraries. In the case of the Vector class, the member that returns an enumeration is elements(). A typical loop using Enumeration to step through the elements of a Vector vec is as follows:
Enumeration elts = vec.elements();
while (elts.hasMoreElements())
doSomething(elts.nextElement());

The Vector class, its related iterator class and the client program are interrelated, as illustrated in Figure 1.

Figure 1
Figure 1:

The same technique can be applied to many other sorts of collections as well, if they provide members that return enumerations like the elements() member of the Vector class.

Problems with Java Enumeration
The model Java provides to define iterators is known as cursors [5]. A cursor is an object that points into a collection and may be used to retrieve successive elements. The interface for cursors includes a test for completion (hasMoreElements()) and increment (nextElement()). The attributes of a cursor maintain the current state of the related iterator.

It is argued that iterators can be developed for all the common data structures using only cursors [6]. However, it is by no means a simple task to develop cursors for non-linear data structures like trees or graphs [7].

An iterator class for enumerating elements of binary trees will make the points clear (see Listing 1). Since the current node does not provide sufficient information to determine what the next node in an inorder sequence is, a stack is introduced to supplement the missing information.

This may be viewed as a non-recursive inorder traversal procedure tailored to meet the Enumeration interface. Even though it is possible, in principle, to remove all the recursions in any algorithm, the burden on the programmer is overwhelming when the algorithm is intertwined with complex control and/or data structures. What's worse is that the resulting program is far less readable than the original recursive one because control information for the enumeration procedure is dispersed at various places of the program.

The iterator construct of CLU is quite different from the Java Enumeration. A procedure-like module called an iter represents iterators. Since an iter module can be implemented using arbitrary algorithms, there is no need to explicitly maintain the current state to determine the next element. As a result, it is relatively easy to convert a traversal procedure into an iter module.

Iterator as a Generalized Enumeration
We propose a new iterator mechanism for Java in which enumeration procedures can be described as if they were ordinary traversal procedures. The whole interface of Java iterators is encapsulated in a single Java class called Iterator. From the user's point of view, an iterator is an instance of a class that extends the Iterator class which, in turn, implements the Enumeration interface.

Iterator developers should derive an iterator class from the Iterator class. The members hasMoreElements() and nextElement() are implemented already in the Iterator class and iterate() is the only member that should be implemented further.

Iterate() corresponds to the iter module in CLU. It implements the enumeration procedure using arbitrary control structures including recursive calls and infinite loops. Any construct permissible within a plain member is also allowed. Additionally, it may contain many invocations of yieldElement() whose role is to enumerate an object and make it available to clients. Figure 2 illustrates the definition of the Iterator class.

Figure 2
Figure 2:

The effect of calling an iterator is as follows. The first time an iterator is called via hasMoreElements() and nextElement(), it commences to execute iterate(). It does this until it reaches an invocation of yieldElement(), when it suspends execution and makes the value of expression visible to the corresponding call. Upon subsequent calls, it resumes execution just after yieldElement(), suspending whenever it reaches another yieldElement() until it exits.

An implementation of an iterator class for the inorder traversal of a binary tree using the Iterator mechanism is shown in Listing 2.

Even if the main use of iterators is to implement enumerations of data types, they are also useful in their own right. This is indicated by the next example (see Listing 3). Sort is a class for iterators that enumerates elements of an array in ascending order using the quicksort algorithm [8]. Once again, the recursive nature of the original algorithm is well reflected by the enumeration procedure.

Our final example is a typical producer/consumer problem known as the Grune filter [9]. A filter accepts characters from an input stream and puts them out, replacing all occurrences of aa by b. Another filter replaces all occurrences of bb by c. Input is fed into the first filter, the output of which is fed into the second filer. What is actually observed is the output of the second filter. The filtering process can be programmed as in Listing 4.

It may seem quite simple at first, but programming an iterator class like Compact as a cursor is very difficult and error-prone. This is because it has many points of control that should be saved to determine what to enumerate next upon consecutive requests. In the Iterator model, there is no need to save control points explicitly. All that is needed is to invoke yieldElement() at various points.

Iterators are first-class objects and can be used in place of ordinary enumerations. This property exhibits maximum flexibility in programming with iterators [3]. Useful iterator-related programming idioms (such as multiple iterators per loop or collection, parameterization of iterators to procedures or other iterators, binding iterators to variables and pipeline programming) can be applied.

The Grune filter is a typical example of pipeline programming. A pipeline is established when an iterator, acting like the consumer, embeds another iterator as a producer. In the following example, iterator AAB embeds an iterator for the standard input stream. By the same token, BBC embeds AAB.

Compact AAB = new Compact('a', 'b',
new StreamIterator
(new FileInputStream(
FileDescriptor.in)));
Compact BBC = new Compact('b', 'c', AAB);

It would be fair to say that programming techniques of this kind could also be applied to plain Java enumerations, but they are more strongly supported when iterators of arbitrary control can be developed easily.

Implementation
An iterator (the enumeration procedure) and its clients, e.g. a procedure invoking hasMoreElements() and nextElement() on that iterator, can be viewed as communicating sequential processes [10] that communicate by means of an intermediate buffer. Whenever a client requires another object, it takes one from the buffer. If the buffer is empty, it waits for the enumeration procedure to generate a new object and place it into the buffer. The enumeration procedure works to fill up the buffer and when it is full, it goes to sleep until the buffer is available again for new objects [11].

The buffer size chosen for the implementation of Java iterators is zero; thus, the enumeration procedure is activated directly by the clients upon every request for a new object. The buffer mechanism is implemented as a class named Buffer.

The basic idea of the implementation is as follows. An iterator and its client are executed in two different threads communicating through a buffer. The request for a new object is ultimately interpreted as an invocation of the get() member, which suspends the thread for the client until new data arrives at the buffer. The enumeration procedure takes control at this point, generating a new object and delivering it to the buffer by making a request for yieldElement(). It then invokes put(), which wakes up the thread for the client. Since there is neither true parallelism nor preemption between these two threads, they behave as if they were co-routines [12]. Figure 3 depicts the overall structure of the proposed implementation.

Figure 3
Figure 3:

The thread for the iterator is responsible for starting the iterate() member that contains an enumeration procedure written by the iterator developer.

private class IteratorThread
extends Thread {
public void run()
{ iterate(); ...
}
}
private IteratorThread thread; ...

The constructor creates and starts a thread for the enumeration procedure. The thread is marked as a daemon thread to allow terminating the enumeration procedure when the client stops requests for further enumeration.

public Iterator() {
thread = new IteratorThread();
thread.setDaemon(true); thread.start();
}

HasMoreElements() should determine whether or not the enumeration procedure is still pending. This problem is harder to solve than it looks. It is insufficient to simply check the liveness of the thread using isAlive() because there can be a race-condition between the two threads. The solution adopted here is pre-fetching an item delivered by the enumeration procedure. Pre-fetching is achieved by the get() request to the shared buffer.

private Buffer data = new Buffer();

The thread issuing that request is not woken up until some other thread (running the enumeration procedure) passes back the control. If the value delivered to the buffer is a meaningful item, it is a clue that the thread for the enumeration procedure is actually running and that the item stored in the buffer is one written by a yieldElement() call.

The value null was chosen for the representation of a meaningless item. The iterator thread delivers this value when there is a request for a new item even after the enumeration procedure is finished.

private class IteratorThread
extends Thread {
public void run() {
iterate(); data.put(null); }
}

Lookahead is a temporary store for the pre-fetched item, which is shared between hasMoreElements() and nextElement().

private Object lookahead = null;

It should be a trivial task, by now, to determine whether the enumeration procedure is still pending. All that needs to be done is to check that the pre-fetched value is not null.

public final boolean hasMoreElements() {
lookahead = data.get();
return (lookahead != null);
}

NextElement() has only to return the value pre-fetched by the corresponding hasMoreElements() call.
public final Object nextElement() {
return lookahead;
}

YieldElement() records the value to be enumerated next to the buffer which, in effect, passes the control back to the client.

protected final void yieldElement(Object x) {
data.put(x);
}

The implementation has a shortcoming in that every invocation of nextElement() should be preceded by exactly one corresponding invocation of hasMoreElements().

There is also a small problem: the iterator thread tends to stay alive longer than it actually needs to when the iterator is used within a context that requires premature termination. This problem can be resolved if the thread could be terminated (stopped in the Java sense) explicitly by the programmer. A new member stop() is introduced to allow the required intervention.

abstract class Iterator implements Enumeration
{
...
public final void stop() { ... }
}

The implementation of the stop() member is accomplished by terminating the iterator thread if it is still alive.

public final void stop() {
if (thread.isAlive()) thread.stop();
}

Iterator threads that are neither finished nor stopped are maintained until the end of the whole program and then discarded when the main thread exits. Marking the iterator thread as a daemon already ensured this. The complete source code for both the Iterator and the Buffer is shown in Listing 5.

Performance
The implementation has been tested using JDK 1.1.5[13] for SunOS 5.5.1. The results of the performance of cursors on the same workstation are shown and compared with the results of the proposed implementation. The benchmarks used are Tree, Sort and Primes. Tree is a sorting program based on binary search trees. Sort is also a sorting program based on the quicksort algorithm. The Primes program enumerates prime numbers until the given count is reached.

Table 1 provides timings (in milliseconds) for example programs that created up to 500,000 random elements and consumed the same number of elements entirely. The "?" mark represents an OutOfMemoryError exception.

Table 1

Cursors performed better than iterators in every test case. The overhead incurred during the synchronization of buffer operations was the dominant factor of the overall execution time when the enumeration procedures were very simple. When a large amount of computation was required for the enumeration of a single data item, however, iterators could run nearly as fast as cursors, which can be observed near the lower right corner of the table. Even if cursors are desirable for efficiency reasons, it would still be a useful programming strategy to use iterators during the initial development stages and then translate them into cursors later.

Resources

  1. Liskov, B. et al., CLU Reference Manual, Lecture Notes in Computer Science 114, Springer-Verlag, 1981.
  2. Murer S., Omohundro, S. and Szyperski, C., 'Sather Iters: Object-Oriented Iteration Abstraction', Technical Report TR-93-045, ICSI, Berkeley, 1993.
  3. Griswold, R.E. and Griswold, M.T., The Icon Programming Language, 2nd ed., Prentice-Hall, 1990.
  4. Gosling J., Joy, B. and Steele, G., The Java Language Specification, Addison-Wesley, 1996.
  5. Coplien, J., Advanced C++ Programming Styles and Idioms, Addison-Wesley, 1992.
  6. Budd, T., Multiparadigm Programming in Leda, Addison-Wesley, 1995.
  7. Horowitz, E., Sahni, S. and Mehta, D., Fundamentals of Data Structures in C++, Computer Science Press, 1995.
  8. Hoare, C.A.R., 'Quicksort', Comput. J., 5, (1), 10-15 (1962).
  9. Grune, D., 'A View of Coroutines', SIGPLAN Notices, 12, (7), 75-81 (1977).
  10. Hoare, C.A.R., Communicating Sequential Processes, Prentice-Hall International, 1985.
  11. Baker, H.G., 'Iterators: Signs of Weakness in Object-Oriented Languages', OOPS Messenger, 4, (3), 18-25 (1993).
  12. Marlin, C.D., Coroutines: A Programming Methodology, a Language Design, and an Implementation, Springer-Verlag, 1980.
  13. JDK 1.1.5, available from http://java.sun.com/, JavaSoft, Sun Microsystems, Inc.
Conclusion
A new mechanism for the Java programming language is presented in this article that aids programmers in developing complex iterators. In the proposed mechanism, iterator developers can implement enumeration procedures as if they were ordinary traversal procedures. Arbitrary control structures, such as recursive calls or infinite loops, can also be used inside enumeration procedures. This level of expressive power is essential when writing iterators for non-linear data structures or for those with complex control.

Iterators developed using the proposed mechanism are first-class objects and can be used in whichever context an Enumeration makes sense. This property exhibits maximum flexibility in programming with iterators. The implementation is still a prototype but it faithfully realizes all the essential features of the proposed mechanism.

About the Author
Myung Ho Kim is an associate professor of Dong-A University, Pusan, Rep. of Korea. He has been teaching programming paradigms to undergraduate and graduate students for more than 10 years. Currently he is leading a project for developing a full-fledged interpreter and transformation system of a functional language in Java. He can be reached at: [email protected]

	

Listing 1.
  
class Tree {  
   class TreeNode { ... }  
   public Enumeration elements() {  
      return new InorderIterator(rootNode);  
   }  
   class InorderIterator implements Enumeration {  
      public InorderIterator(TreeNode node) {  
         currentNode = node; walk = new Stack();  
      }  
      public boolean hasMoreElements() {  
         while (currentNode != null) {  
            walk.push(currentNode);  
            currentNode = currentNode.leftChild;  
         }  
         return walk.empty();  
      }  
      public Object nextElement() {  
         currentNode = (TreeNode)walk.pop();  
         Object value = currentNode.data;  
         currentNode = currentNode.rightChild;  
         return value;  
      }  
      private TreeNode currentNode;  
      private Stack walk;  
   }  
   private TreeNode rootNode;  
}  

Listing 2.
  
class InorderIterator extends Iterator {     
   public InorderIterator(TreeNode node) {  
      rootNode = node;  
   }  
   protected void iterate() { inorder(rootNode); }  
   private void inorder(TreeNode currentNode) {  
      if (currentNode != null) {  
         inorder(currentNode.leftChild);  
         yieldElement(currentNode.data);  
         inorder(currentNode.rightChild);  
      }  
   }  
   private TreeNode rootNode;  
}  

Listing 3.
  
class Sort extends Iterator {     
   public Sort(double[] v, int low, int high) {  
      data = v; left = low; right = high + 1;  
   }  
   protected void iterate() { quick(left, right); }  
   private void quick(int left, int right) {  
      if (left <= right) {  
         int pos = partition(left, right + 1);  
         quick(left, pos - 1);  
         yieldElement(new Integer(data[pos]));  
         quick(pos + 1, right);  
      }  
   }  
   private int partition(int low, int high) { ... }  
   private double[] data;  
   private int left, right;  
}  

Listing 4.
  
class Compact extends Iterator {  
   public Compact(char c1, char c2, Enumeration enum) {  
      s1 = new Character(c1); s2 = new Character(c2);  
      theEnum = enum;  
   }  
   protected void iterate() {  
      while (theEnum.hasMoreElements()) {  
         Object s = theEnum.nextElement();  
         if (s.equals(s1)) {  
            if (theEnum.hasMoreElements()) {  
               s = theEnum.nextElement();  
               if (s.equals(s1))  
                  s = s2;  
               else  
                  yieldElement(s1);  
            }  
            else {  
               yieldElement(s1);  
               break;  
            }  
         }  
         yieldElement(s);  
      }  
   }  
   private Enumeration theEnum;  
   private Character s1, s2;  
}  

Listing 5: Complete Source Codes for Iterator API.
  
// class Iteartor  
abstract class Iterator implements java.util.Enumeration {  
   public Iterator() {  
      thread = new IteratorThread();   
      thread.setDaemon(true); thread.start();  
   }  
   public final boolean hasMoreElements() {  
      lookahead = data.get();  
      return (lookahead != null);  
   }  
   public final Object nextElement() {  
      return lookahead;  
   }  
   protected final void yieldElement(Object x) {  
      data.put(x);  
   }  
   public final void stop() {  
       if (thread.isAlive()) thread.stop();  
   }  
   abstract protected void iterate();  
   private class IteratorThread extends Thread {  
      public void run() {  
         iterate(); data.put(null);  
      }  
   }  

   private Token data = new Buffer();  
   private IteratorThread thread;  
   private Object lookahead = null;  
}  

// class Buffer  
final class Buffer {  
   public synchronized Object get() {  
      requestIssued = true;  
      notify();  
      while (! dataAvailable)  
         try { wait(); }  
         catch (InterruptedException ex) {}  
      dataAvailable = false;  
      return (data);  
   }  
   public synchronized void put(Object item) {  
      while (! requestIssued)  
         try { wait(); }   
         catch (InterruptedException ex) {}  
      requestIssued = false;  
      data = item;  
      dataAvailable = true;  
      notify();  
   }  

   private Object data;  
   private boolean dataAvailable = false;  
   private boolean requestIssued = false;  
}
  
      
 

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.