HomeDigital EditionSys-Con RadioSearch Java Cd
Advanced Java AWT Book Reviews/Excerpts Client Server Corba Editorials Embedded Java Enterprise Java IDE's Industry Watch Integration Interviews Java Applet Java & Databases Java & Web Services Java Fundamentals Java Native Interface Java Servlets Java Beans J2ME Libraries .NET Object Orientation Observations/IMHO Product Reviews Scalability & Performance Security Server Side Source Code Straight Talking Swing Threads Using Java with others Wireless XML
 

An Introduction To Genetic Algotrithms In Java, by Michael Lacy

Over the past decade the Internet has evolved from a research project living in the realms of academia and government to a global infrastructure for electronic commerce and digital communication that has sent the stock market on a roller-coaster ride to new highs (and lows).

It's a digital world in which a Darwinian survival of the fittest is taking place right before our eyes as web sites and web applications compete for the right to live another day. Whether it's another site offering a better service/product or the latest computer virus, a web application faces many competitors that threaten its very existence. As in the biological world, only the fittest will survive. And the fittest are the ones capable of adapting to their environment faster than their competitors can.

As developers of these digital organisms, our job is to ensure that we equip our web applications with the means for survival. Usually this entails an exhaustive list of if-then statements or extensive exception handling, all of which we must conceptualize in order to code. As web applications become increasingly complex, the task of identifying all possible scenarios to encode becomes daunting and our job as developers becomes reactionary as we code new features, bug fixes, and virus patches to respond to the increased competition for survival in the digital world. What if we could provide software with the ability to adapt to its environment? Why not use the biological model of natural selection to do it?

In the January issue of JDJ (Vol. 6, issue 1) I introduced a technique born in the AI community that uses concepts from biological natural selection to solve complex and often highly nonlinear optimization problems encountered in computer science - the genetic algorithm. I examined the building blocks of genetic algorithms and why java is well suited to their implementation. This month I'll discuss the details of a simple genetic algorithm implementation in the hopes that your curiosity will be sparked to pursue further investigation.

The nature of software development is evolving at a brisk pace. Web applications are increasingly complex and compete daily for survival with other sites, viruses, bugs, and server crashes. As I said before, only the fittest will survive.

Now that I've grabbed your attention, we need to set some expectations before embarking on our journey. First and foremost, genetic algorithms still exist primarily in academia due to their (sometimes) imposing computational requirements and learning curve. This means their commercial application is still relatively unproven.

The scenario described above in which web applications have the ability to adapt to their environment is no more than that - a scenario. The realization of this scenario depends on us, the software developers of the web application world, to discover suitable applications for genetic algorithms.

Second, a few pages covering the implementation details for a genetic algorithm is undeniably insufficient coverage for the development of a thorough understanding of the technique. We're merely skimming the surface of possibilities. With that said, let's jump right in.

Review: Genetic Algorithm
A genetic algorithm is a model for machine learning in which a population of randomly created individuals goes through a simulated process of evolution - a digital survival of the fittest in which each individual represents a point in a problem's solution search space. Reviewing the terminology discussed in Part 1, an individual is referred to as a chromosome. A chromosome consists of genes, which represent the parameters of the problem being optimized. A collection of chromosomes on which a genetic algorithm operates is labeled a population.


Figure 1

Through the employment of a fitness function, chromosomes are evaluated and ranked according to their relative strength/weakness within the population. The fitter are chosen to reproduce while the less fit generally don't survive into succeeding generations. After an arbitrary number of generations, the algorithm should converge on an optimal solution or, more specifically, on the chromosome representing an optimal solution. The remainder of this article will illustrate how this is done using, as an example, a classic computer science problem: the Traveling Salesman Problem (TSP).

Traveling Salesman Problem
The TSP is a conceptually simple problem: a salesman wishes to visit a number of cities, starting and ending at the same city, while visiting all cities only once and minimizing the distance traveled. However, the solution to this problem when the number of cities increases is not trivial. For the case of three cities, six permutations representing viable paths are possible. Easy enough. Now let's consider the case of 20 cities in which there are over 2.4 billion billion permutations. If we exhaustively searched through all the possible path traversals at a rate of 500 million per second, it would take over 150 years to examine them all! There is obviously a need for a method to comb through this massive search space more efficiently. Otherwise my great-great-great grandchildren will be the only ones around when the exhaustive search concludes. Meet the genetic algorithm.

Selection of Parameters
As with any problem we attempt to code a solution for, the first step entails clearly defining the problem to be solved, the required parameters for obtaining a solution, and the encoding to be used to represent a solution to the problem. Using a genetic algorithm to obtain an optimal solution for the TSP requires that we identify the relevant parameters and their encoding in terms of chromosomes, genes, and populations. A quick inspection of the TSP reveals the pertinent parameters for encoding a representative solution to the problem: a list of cities to be traversed in order and the distance between two arbitrary cities. Now let's put it into genetic algorithm terminology.

Parameter Representation
As discussed earlier, a chromosome represents a single solution in the search space of the problem being optimized. With reference to the TSP, a chromosome is nothing more than an ordered list of cities in which each city is represented only once, and their order determines the total distance traveled. With chromosomes being a collection of genes, a gene for the TSP is simply an object representing a city (its name and x and y coordinates). For simplicity I'm using the longitude and latitude measurements as Cartesian x and y coordinates, respectively.

Listing 1 is the source for TSPGene.java, figure 1 is the UML class diagram of the source code provided at the end of this article, and figure 2 is a graphical depiction of how the TSP parameters translate into genes, chromosomes, and a population. Upon examination of the source code, you'll notice that genes are initialized via the retrieval of data from a properties file.


Figure 2

The focus of this article isn't how to build an extendable genetic algorithm framework. However, as you glance through the code you'll notice that through the use of interfaces and externally defined parameters, extendability of the genetic algorithm can be easily achieved to solve, or rather optimize, problems with similar representations. For example, these representations can include permutation problems with unique genes (such as the TSP), permutation problems without unique genes (meaning a gene may be included zero to many times in a single chromosome), and real-parameter problems in which the genes represent real numbers and their bounds. I leave it to you to develop such a framework, a framework in which all you have to do is set the genetic algorithm parameters and gene data externally and run the genetic algorithm without any source code modification. Solving the other types of problems described above then becomes a matter of appropriate problem definition rather than sophisticated programming.


Fitness Evaluation Function
Now that we have the TSP parameters represented as genetic algorithm objects (population, chromosomes, genes, etc.), we can begin to describe the fundamental operations performed on the objects once the algorithm is set in motion. The first one we'll look at is the fitness, or cost, function. It provides the method by which chromosomes are compared to one another in order to determine relative survival fitness. For the TSP the fitness of a chromosome is calculated by traversing the genes in order, summing the distances between adjacent cities. The chromosome with the lowest fitness score is the fittest as it represents the shortest path between the cities. I can't emphasize enough the importance of the cost function as it determines which chromosomes are the fittest within a population. Without an appropriate definition of the fitness evaluation, you might as well rummage around the solution search space at random. Listing 2 provides the source code in TSPChromosome.java in which the fitness is calculated.

Initial Population
With the parameters and cost function defined, it's time to set the genetic algorithm in motion. Recalling the genetic algorithm workflow defined last month, individuals are selected from a population (combined in a process called crossover) to create a set of children, and the children are randomly mutated to create a new set of chromosomes to be reinserted into the population. Once enough children chromosomes have been created to replace a population, a generation is said to have passed. With each generation, all the chromosomes are evaluated according to the cost function. Only the fittest survive into the next generation where the selection, crossover, and mutate process begins anew. After a number of generations have elapsed, the best chromosome is selected from the population and represents the optimal solution to the problem being solved. Figure 3 depicts a flowchart of the algorithm.

Upon examination of the properties file, you'll notice two properties concerning the population size. One is the initial population size and the other is the "regular" population size. The way I've encoded the genetic algorithm for the TSP is that an initial population of 500 random chromosomes is created and sorted, and the 100 fittest chromosomes are the ones selected for the genetic algorithm starting point.

The advantage of this approach is that by creating a large initial population, you can cover a greater amount of the solution search space initially and then pare down the population to the fittest to focus the search using relatively strong chromosomes. Otherwise, the computational power required to perform a genetic algorithm on a population of 500 chromosomes is far greater than that required for 100 chromosomes.

Selection
Continuing with the genetic algorithm workflow, it's now time to select parents for reproduction. The goal of selection is to choose parents that are capable of creating fitter children while maintaining genetic diversity. What this means is that we want the genetic algorithm to pick not only the fittest to reproduce, but also occasionally the less fit chromosomes to maintain variation in the population. If a few relatively strong chromosomes are continually selected for reproduction, it's likely that the algorithm will converge on these "super" chromosomes since it's their genetic material that's most likely to pass on to the next generation. And it may be the case that the chromosomes represent a local minimum in the cost function rather than the global minimum we're searching for.

The technique I've used in the selectParents() method of TSPPopulation.java (see Listing 3) is referred to as tournament selection. A small group of chromosomes is randomly selected, in this case six, and the two fittest are selected to reproduce. Keeping the tournament size small results in a smaller selection pressure, thus increasing genetic diversity.

Crossover
Once parents have been selected, the crossover operator combines the two parents randomly to create two children chromosomes. A crossover probability (in our example 90%) is used to determine whether crossover will occur for two given parents. If crossover doesn't occur, the children will be exact replicas of the parents. Crossover is an essential operator for a genetic algorithm as it is the mechanism by which the algorithm "moves" through the solution search space toward convergence.

Continuing with the TSP example, a typical crossover operation includes selecting a chunk of a chromosome and iterating through the other parent chromosome to extract the genes that weren't selected from the first parent in order to combine them in a new child. As discussed earlier, the TSP is basically a permutation problem in which gene uniqueness must be maintained. For other problems, such as permutation without the uniqueness constraint or real-valued problems, crossover takes on a completely different appearance. But for now, we'll use the TSP crossover described above. Listing 4 displays the crossover() method from TSPChromosome.java, which illustrates in code the concepts described above.

Mutation
Mutation is similar to crossover, except that only a single chromosome is involved. As with crossover, there's a probability associated with mutations, albeit a low one (5% in our example). The mutate() method (see Listing 5) defines a very simple mutation method: two genes are selected at random from the chromosome and swapped. A number of mutation techniques are available depending on the problem encoding (permutation, real-valued, etc.), and I recommend further investigation to understand them. Returning to the comparision with crossover, mutation also ensures that a genetic algorithm "moves" through the solution search space, although not always in the direction of increasing fitness. Remember that these operators are blind, that they're determined by probability and chance. The cost function ensures that the population as a whole moves toward an optimal solution.

Insertion
Parents have been selected and children chromosomes created via crossover and an occasional mutation. What next? It's time to insert the children into the population and begin the selection, crossover, and mutation process anew. To preserve the strongest chromosomes within the population, a concept called elitism that protects the n (n = 10 in our TSP example) most elite chromosomes is introduced and replaces the rest of the population with the newly created children.

Convergence
The goal of employing a genetic algorithm for the TSP is to converge on an optimal path (i.e., chromosome) through the cities in which the distance traveled is minimized. The convergence criterion is thus the minimum distance, but since we don't know it a priori, we can't use it to test that the algorithm has converged. Convergence is a key area of research in genetic algorithms, and the approach commonly used today is to provide large populations and a high number of generations to give the algorithm the best chance of convergence. One of the reasons genetic algorithms still exist primarily in academia is that, based on all the parameters that can be set for a genetic algorithm and the different encodings it can assume, there isn't consensus on convergence criteria, meaning that it's largely a process of trial and error for the domain represented.

Conclusion
I've briefly examined the key components of a genetic algorithm and provided examples using the Traveling Salesman Problem. Much of this affirms that use of a genetic algorithm is by no means an exact science and is largely predicated on trial and error in determining an appropriate encoding of the problem being optimized as well as the selection of the parameters controlling the genetic algorithm.

My hope is that I have presented an interesting programming technique and sparked your interest in wanting to know more about genetic algorithms and the power they hold for optimizing complex problems we encounter every day as software developers.

In my opinion we're approaching a point in software development where our jobs are becoming more and more reactionary to the competitive landscape that the internet provides. To shift back into the proactive role, we need to develop software that relieves us of the burden of coding for every possible scenario. We won't get there overnight, but we have to start somewhere, and the genetic algorithm is a logical starting point as it provides a framework for the evolution of digital applications to solve problems as they arise. I encourage you to dabble with the code provided with this article and think seriously about the future of software development. As Bob Dylan so eloquently stated back in 1964, "the times, they are a-changin'."

Author Bio
Michael Lacy is an engineer for the platform development group at Shutterfly, an online photo service, where he develops Web-based solutions for digital image printing, enhancing, and sharing. He's also a certified Java programmer and developer.
[email protected]

	


Listing 1: TSPGene.java

package com.lacy.genetic.tsp;
import java.util.Properties;
import com.lacy.genetic.*;
public class TSPGene implements IGene
{
    private String m_description;
    private double m_x;
    private double m_y;


// Constructors
    public TSPGene() {}
    public TSPGene(Properties i_properties, int i_index) { 
        init(i_properties, i_index); }
    public TSPGene(String i_description, double i_x, double i_y)
    {
     this.m_description = i_description;
     this.m_x = i_x;
     this.m_y = i_y;
    }


// Method to initialize a gene from a properties file
    public void init(Properties i_props, int i_index)
    {
    setDescription(i_props.getProperty("gene." + i_index + 
        ".description"));
           setX((Double.valueOf(i_props.getProperty("gene." + i_index +".x")).doubleValue()));
    setY((Double.valueOf(i_props.getProperty("gene." + i_index + 
        ".y")).doubleValue()));
                                }


// Accessor/Settor methods
    public String getDescription() { return m_description; }
    public void setDescription(String i_description)
                                                { m_description = i_description; }
    public double getX() { return m_x; }
    public void setX(double i_x) { m_x = i_x; }
    public double getY() { return m_y; }
    public void setY(double i_y) { m_y = i_y; }


// Utility methods
    public IGene copy() { return new TSPGene(getDescription(),getX(), getY()); }
    public boolean equals(IGene i_gene)
    {
    return (i_gene.getDescription().equals(m_description));
    }
    public String toString() { return m_description; }
                                }



Listing 2: TSP Fitness Evaluation

// Returns the total distance traveled for the order of cities  //              
specified by this chromosome.


    public double getFitness()
    {
        if (m_fitness > 0.0)
            return m_fitness;


        double i_fitness = 0.0;


        for (int i = 0; i < m_genes.size(); i++)
        {
            TSPGene g1 = (TSPGene) m_genes.at(i);
            TSPGene g2 = null;
            if (i == (m_genes.size() - 1))
                        g2 = (TSPGene) m_genes.at(0);
            else
                        g2 = (TSPGene) m_genes.at(i+1);


            i_fitness += calcDistance(g1.getX(), g1.getY(), 
        g2.getX(), g2.getY());
        }


        m_fitness = i_fitness;


// Calculates the cartesian distance between two points specified
//              by points (x1, y1) and (x2, y2) using the Pythagorean Theorem.


    private static double calcDistance(double x1, double y1,
                                double x2, double y2)
    {
        double deltaXSquared = (x2 - x1) * (x2 - x1);
        double deltaYSquared = (y2 - y1) * (y2 - y1);


        return Math.pow(deltaXSquared + deltaYSquared , 0.5);
    }



Listing 3: Parent Tournament Selection

// Returns 2 IChromosome objects using tournament selection
    protected IChromosome[] selectParents(int i_tournamentSize)
    {
        IChromosome[] parents = new IChromosome[2];
        Array gladiators = new Array(i_tournamentSize);


// pick a random spot within the population
        int startIndex = GeneticAlgorithmUtils.getRandomInt(0, 
        m_population.size() - i_tournamentSize);


// get as many chromosomes as specified by tournament size
        for (int i = startIndex; i < (startIndex + 
        i_tournamentSize); i++)
        {
          gladiators.put(i - startIndex, m_population.at(i));
        }


// sort the sub-population by fitness
        sort(gladiators);


// return the best two
        parents[0] = (IChromosome) gladiators.at(0);
        parents[1] = (IChromosome) gladiators.at(1);


        return parents;
                                    }



Listing 4: Crossover

/** Performs a crossover with this chromosome and the one specified in the argument. 
Returns two children chromosomes while preserving the uniqueness of genes in the chromosomes, 
meaning that all genes will be represented once and only once in the chromosome. */


public IChromosome[] crossover(IChromosome i_chromosome, double i_crossoverRate


   {
   int size = this.getSize();
   IChromosome mom = this.copy();
   IChromosome dad = i_chromosome.copy();
   IChromosome[] children = new IChromosome[2];
   double random=GeneticAlgorithmUtils.getRandomDouble(1.0);


   if (random < i_crossoverRate)
   {
   int crossoverPoint=GeneticAlgorithmUtils.getRandomInt(1,size-1);


   IGene[] child1begin=mom.getGenes(0, crossoverPoint-1);
   IGene[] child1end=getEndGenes(child1begin,
        dad.getGenes(0,size-1));
   IGene[] child2begin = dad.getGenes(0, crossoverPoint - 1);
   IGene[] child2end = getEndGenes(child2begin, mom.getGenes(0, 
                        size -1));
   children[0] = new TSPChromosome(child1begin, child1end);
   children[1] = new TSPChromosome(child2begin, child2end);


   return children;
        }


// if no crossover, children are same as parents
        children[0] = mom;
        children[1] = dad;


        return children;
    }



Listing 5: Mutation

/** Mutates this chromosome by randomly selecting two genes and swapping them 
Uses the mutation rate specified by i_mutationRate */
    public void mutate(double i_mutationRate)
    {
    double random = GeneticAlgorithmUtils.getRandomDouble(1.0);


    if (random < i_mutationRate)
     {
     int size = this.getSize();


     int r1 = GeneticAlgorithmUtils.getRandomInt(0, size - 1);
     int r2 = GeneticAlgorithmUtils.getRandomInt(0, size - 1);


     IGene g1 = (IGene) this.getGene(r1);
     IGene g2 = (IGene) this.getGene(r2);


        m_genes.put(r2, g1);
     m_genes.put(r1, g2);
        }
    }



Listing 6: Insertion

/** Method to insert children into the population, preserving the m_elitism best from the previous generation */


protected void insert(IChromosome[] i_children, int i_elitism)
    {
        sort(m_population);


        for (int i = i_elitism; i < m_population.size(); i++)
            m_population.put(i, i_children[i - i_elitism]);


    }


  
 

Download Assoicated Source Files (Zip format ~ 9.19 KB)

 

All Rights Reserved
Copyright ©  2004 SYS-CON Media, Inc.
  E-mail: [email protected]

Java and Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. SYS-CON Publications, Inc. is independent of Sun Microsystems, Inc.