This is based on ``Genetic Algorithms'' by David Goldberg, Addison Wesley, 1989.
At this point, one might wonder, why don't we work with more than one solution at a time? Maybe we can blanket the feasible region and search many areas at once. This idea is exploited in genetic algorithms.
As we look around us, we see many wonders in nature. Creatures can repair themselves, guide themselves, reproduce themselves with a robustness that is truly amazing. There are two possible ways this has been done: through divine intervention or through evolutionary progress. The former corresponds to ``non-deterministic polynomial'' algorithms (simply guess the correct solution and then confirm your answer), which is not the most practical way for us to solve problems. The latter way, however, leads to many interesting possibilities to solve our own, relatively minor, problems.
Based on what we know about biological evolution, we will explore a class of algorithms known as genetic algorithms. The genetic possibilities of a single creature are rather limited. Similarly, in order to solve discrete problems, it will be necessary to work with a population of solutions. These solutions will be encoded in some manner. Later, we will discuss what features an encoding should have. For now, let us encode our solution as 0-1 strings.
Genetic algorithms require very little information about a problem. Let us assume that we have a black box that can take a solution string and return a function value. Let us work with a particular black box, whose workings can be a mystery for the moment.
One nice feature about genetic algorithms is that we explore a large number of points simultaneously. This permits us to escape poor local optima quickly. Rather than getting stuck (as in local search) or doing a laborious stochastic climb, we have other points already generated. This sample is called the population. Let us begin with a random population of size 4. Suppose the members are
11101 01000 11000 01011
The simplest genetic algorithm is composed of three operators:
We begin with the choice of the mating pool. In this stage, we copy strings according to their ``fitness to survive.'' Here the fitness to survive can be summed up in the function value the black box returns for each. Suppose the function values are as follows:
11101 -- 169 01000 -- 576 11000 -- 64 00011 -- 361
The total fitness of the population is 1170, so the fraction of the fitness is 14.4%, 49.2%, 5.5%, 30.9% respectively. We are now ready to kill off unworthy parents, and copy the better ones. We generate four members of the pool, choosing randomly. Suppose we generate 1 of the first string, 2 of the second, none of the third, and 1 of the fourth. Our population becomes:
11101 01000 01000 00011 Our total fitness is now higher, so we have increased the average fitness of the population. At this point, however, we have not improved the best solution (which of course is our goal). Now that we have an improved population, we are ready to mate our strings. We do this by pairing off some subset of strings (let's say all of them for now) and creating new children. The method we use is crossover: we take the first part of one of the strings and match it with the second part of the other and vice versa. For instance, suppose strings 1 and 2 were paired off. We might (randomly) decide to take the first 4 digits of string 1 with the last digit of string 2. We would also take the first 4 of string 2 with the last of string 1. This gives the following two strings:
11100 01001 Similarly, we might exchange after the second digit of strings 3 and 4 to get a population (and function values) of:
11100 -- 144 01001 -- 625 01011 -- 729 00000 -- 256 Now we have increased the fitness of the best individual. And the individual with value 144, which used to look reasonably good, is now a very weak individual.
Our final step is mutation. Sometimes, due to the luck of the draw, a gene is lost from the population. For instance, it might be that the possibility for having a 1 in the last position is lost. Mutation randomly, but with a very low probability, changes bits to ensure that no gene is lost forever. At this point, suppose no mutation takes place.
We are now ready to repeat the process. We again would determine who would go into the mating pool. Suppose 2 of string 2 and 2 of string 3 go into the pool. Notice that the possibility of having a 1 in the third position is now lost until it is added by mutation. In fact, until the 1 is replaced (allowing 01111 which has function value 961, which is the optimal solution here), 01011 is the best possible solution.
In essence, we now have defined genetic algorithms. Given a problem, you must decide on an encoding of the problem, a method for choosing the mating pool, a method for mating pairs (or more) of solutions, and a method for mutation. The skill in this is to get everything to work together.
Genetic algorithms have been successful in various fields, including pattern recognition. Theoretical work on nonlinear functions suggest some possibilities. Unfortunately, genetic algorithms have not proved to be very successful in combinatorial optimization. This might be due to the amount of work done on local search and related algorithms. The difficulties in determining appropriate mating algorithms make this an interesting, but possibly frustrating, area.