Paper 33: Fitness causes Bloat, Discussion

Bill Langdon (W.B.Langdon@cs.bham.ac.uk)
Mon, 30 Jun 1997 19:59:20 +0100

Next message: Ivan de Falco: "paper n.33: continued"
Previous message: Bill Langdon: "paper n.33: question"

Dear Mario,
Thank you for your two questions on my "Fitness Causes
Bloat" WSC2 paper.

In principle it is possible to count the number of representations of
a given length which have a particular fitness by exhaustively
generating all programs of that size and measuring their fitness. This
procedure is then repeated for the next size and so on until the
desired range of lengths has been covered. However, as you know, this
is not practicable even for modest sized programs as the number of
possible programs of a given length is too vast. A possible
alternative would be to randomly generate programs of the given length
and see what proportion have the specified fitness level. This is only
feasible where the fitness level is low so that there is a reasonable
chance of randomly generating a program which has this fitness.

These two methods can be viewed as (statistically) exact but
impracticable. However a biased estimate could be made by randomly
generating programs from one which already has the required fitness
and looking at the distribution of lengths and fitness. We would
expect a skewed distribution with programs having the same length
being on average longer. Such as estimate is biased as we are choosing
a particular starting point which may not be representative of all the
programs.

Something akin to the above process is going on in an evolving GP
population but if we were to use it to give us a length V fitness plot
we would introduce multiple biases, as we are using multiple starting
points and are means of generating new programs also potentially
introduce biases. However it has the advantage of being feasible.
After a while a GP population may converge to the extent that only
individuals with the best fitness are selected to reproduce, thus the
next generation is composed of children of these. If the number of
programs with the same fitness and same length was small and the
population was large then we would expect the next generation to
contain many duplicates. Ie many identical programs would be
generated. Conversely if there are many ways to encode programs with
the same fitness, the chance of hitting on the same one would be
small. In our work we certainly see the number of identical programs
produced by both crossover and mutation falling as the average program
length increases, indicating the number of programs with the same
fitness increases with length. However as I have stressed this is a
biased estimate and a more careful analysis (like that suggested at
the beginning of this paragraph) is needed to answer your question
definitively.

Considering your second question, at present I cannot bring to mind
any examples of the explicit study of function set arity and bloat. I
have certainly observed that adding 3 or 4 arity functions to a
function set (as you suggest) greatly increases the size of randomly
generated programs. You may find
http://www.cs.bham.ac.uk/~wbl/biblio/gp-bibliography.html useful.

Unfortunately I was unable to view your www home page today.
Thank you taking the time to raise these points.

Bill

W. B. Langdon, Phone +44 121-414-4791
School of Computer Science, Fax +44 121-414-4281
University of Birmingham,
Birmingham. B15 2TT United Kingdom
http://www.cs.bham.ac.uk/~wbl/

Next message: Ivan de Falco: "paper n.33: continued"
Previous message: Bill Langdon: "paper n.33: question"