Genetic Programming

Genetic programming (GP) automatically create programs. It uses the principles of Darwinian natural selection, such as survival of the fitess, to evolve populations of programs. The genetic material of individuals in the population are the programs. Each generation the fitter programs are selected to be parents and mated to give them children. There are two common types of genetic operation, sexual and asexual.

The Genetic Programming Cycle. From GP+DS.

Asexual Reproduction

A sexual reproduction create children from a single parant. This is often known as mutation. The genetic material in the child comes from just one parent in the previous generation. In GP this means the program in the next generation is a copy of a program in the previous generation. It is common to make one or more random changes, known as mutations, that make the child program slightly different from its parent.

Sexual Mating

With sex, (parts of) two programs are combined to yeild the child program. Commonly two points are chosen at random. One in the mother program and one in the father. The code below the cutpoint in the father is copied. The code below the cutpoint in the mother programme is deleted and then replaced by that cut out of the father. The newly formed program is their child. Commonly these operations are performed on copies. The idea is that the mother and father will tend to be better than average (since they got selected to have children) and therefore they will contain good fragments of code. Therefore the child will also tend to contain good code. Sometimes a child will be especially fortunate and will be created by putting together good but different code fragments (and they will work together well in the child).

program 2*x - x
Mum, fitness .64286, 2x-x (red subtree will be removed)

program x\(x\(x-x**3))-x
Dad, fitness .70588, x\(x\(x-x**3))-x (blue subtree copied)

program x+x**2-x
Correct Program, fitness 1.0, x+x**2-x (blue subtree copied)

The newly formed child may also be mutated. There are seveal types of crossover operator available. Some use more than one crossover point in each parent.


Some means is needed to decide which programs are to be parents. Conventionally this is done by a fitness function which gives each individal in the population a number (its fitness). Conventionally this is a scalar number but one of the advantages of eveoltionary approaches is the multiobjectives can be readily supported.

In genetic programming fitness is often calculated by running the program (individual) on one or more inputs (test cases). The answer returned by the program is then compared with the desired answer. This is equaivelent to supervised learning. The goal is to find a function (program) which describes the test data. That is we are fitting a function to some data. By analogy with linear regression (which fits a line to data), this is known as "symbolic regression".

C code for the above example.

Genetic Programming Contacts

See., 3 April 2001