artificial life
Chapter 6: Genetic Programming
6.1 Introduction to Genetic Programming
A limitation of Genetic Algorithms is that all the solutions are of fixed size. It is impossible to add more data to a solution, or to make a solution more elegant since all individuals must have the same size in order to be evaluated in the fitness function. GA's emulate the natural structure of the genetic process, in which bit-vectors are analogous to the series of base pairs that make up DNA. Bit-vectors are not the most powerful structure for their intended purpose since their size is not inherently dynamic. It is certainly possible to allow an individual to only use part of the bit-vector to be more space efficient, but in general the size and complexity of a GA genotype is fairly static. Also, genetic algorithms only provide a solution to a single instance of a problem, rather than being applicable to other problems of the same nature.
These limitations led University of Michigan graduate student John Koza to develop Genetic Programming. Genetic Programming is a natural extension of the genetic algorithm. In a genetic algorithm a population of solutions is given and the algorithm evaluates and breeds generations of better solutions. Genetic programming takes a population of computer programs, runs them on some sample data to see how fit they are and then combines the code using techniques similar to the genetic algorithm to produce a new program [Levy, 1992].
At first glance, Genetic Programming seems to fly in the face of reason. There is such a large number of possible programs, especially flawed ones, that it seems unlikely that anything close to a solution can be found. The key to a successful Genetic Programming scheme, like a good GA, is finding an excellent fitness function to determine how well a particular individual solves the problem, or to be able to quantify just how badly a particular program fails.
Ideally, a population of Genetic Programs has a normal distribution over the range of fitness ratings. There are a handful of relatively excellent specimen, a handful of individuals that fail spectacularly, and a proportionally large amount of genotypes that fall between the two. If this is the case then the handful of unsuccessful genotypes probably are not chosen to populate the following generation and the very best individuals are able to improve using the good pieces of the large population of mediocre individuals.
Also, since the number of possible genetic programs is literally infinite it is important to choose the possible functions or operators wisely. There is no need to include a function in a solution.
A good Genetic Programming scheme is the system Danny Hillis's used to evolve sorting networks. Hillis was interested in generating sorting networks that would sort an array of 16 numbers in as few exchanges as possible. In order to enable the Genetic Programs, which he called Ramps (because the early versions would climb hills to local maxima), to accomplish these sorts he only allowed them to compare numbers and to swap numbers. He only gave the programs the operators necessary to accomplish the task, and didn't add unnecessary complexity. Even so, Hillis used populations of 64K reproducing thousands of generations to conduct his experiment [Levy, 1992].
Danny Hillis used a population of 64K for his experiment because his target platform was a massively parallel computer with 64K processors [Levy, 1992]. John Koza's current endeavor, Genetic Programming, Inc has a system of 1,000 Pentium processors that evaluates solutions in parallel. This massively parallel computer is capable of ingenuity to match humans. Koza once set the system the task of designing a complex electrical circuit. After analyzing the results it was found that 21 of the solutions infringed upon patents, while other solutions succeeded in solving the problem in completely novel ways [Willihnganz, 1999].