In nine of ten independent runs the population bloats. (In run 102 at generation 7, GP finds a high scoring program of one function and two terminals, which it is unable to escape from and the population converges towards it. After generation 35 approximately 90% of the population are copies of this local optima. Similar trapping is also reported in [Langdon1998b]). All populations complete at least 400 generations. However three runs stop before 600 generations when they reach the size limit (one million).
As expected in all runs most new generations do not find programs with a better fitness than found before. I.e. changes in size and shape are due to bloat. Figure 3 shows, while there is variation between the remaining runs, on average each population evolves to lie close to the ridge and moves along it, as predicted.
Figure 4 shows the average population depth varies widely between runs and in several runs the mean depth does not increase monotonically at a constant rate. However the mean of all ten runs is better behaved and increases at about 2.4 levels per generation.
Figure 5 shows the coefficient obtained by fitting a power law to the evolution of mean size of programs from generation 12 to later generations. Again there is wide variation between runs but on average the exponents start near 1.0 (generations 12-50) and steadily rises to 1.9 (between generations 12 and 400). I.e. towards the predicted quadratic limiting relationship between size and time.