[Tackett1994] and [Altenberg1994] both suggest Andy Singleton first proposed what is now often called the intron explanation for bloat. I.e. programs become longer not because longer programs are better but because they harbour more ineffective code (introns) than shorter ones. Since crossover chooses a point in the parent at random, having a higher ratio of ineffective to effective code, means there is more chance of crossover altering the ineffective code. In which case the children behave exactly like their parents. In a conventional GP, this means children of longer parents will have a higher chance of having the same fitness as their parents than the children of shorter programs. Once bloat is underway, almost no improvements in fitness are made, so the struggle for survival goes not to the children who are better than their parents but to those that are the same as their parents. I.e. children of long programs have more chance of themselves being selected to have parents than children of shorter programs. (This implicitly assumes that long programs tend to produce long children). Since the same preference for children of long programs occurs in each generation, on average program size increases with time. As observed.
This is attractive but as Soule points out it is not the full story [Soule and Foster1997,Soule and Foster1998,Soule1998,Langdon et al.1999]. He suggests there is at least one additional mechanism involved. While, contrary to speculation in [Altenberg1994], subtree crossover on average introduces no size change. But there is a ``removal bias'' which means on average children which are smaller than their parents are less fit than children which are bigger. Soule also investigated the evolution of tree shape.
Note these mechanistic explanations don't tell us anything about the rate of bloat or predict ultimate shape. Indeed the iterative nature of the intron explanation has lead to false predictions of exponential growth.
[Rosca1997] suggests GP evolves programs from the root down. He and Soule point out (in side-effect free programs) introns can exist only at the edges of trees. I.e. all the code beneath an intron must also be ineffective. Stressing the importance of the code near the root, in the GP process. Poli also shows we can nullify the advantage of a high ineffective code ratio by having multiple operations (actually mutations). Using a constant mutation rate per tree node, the chance of altering effective code depends upon its size and is independent of the amount of ineffective code.
Often GP analysis assumes the population is made of full binary trees. This is obviously wrong but simple. Section 2 points out that GP populations tend to become like random trees. While apparently more complex than full binary trees, random trees do have nice mathematical properties that might make analysis of the evolution of populations of them tractable. In GP we often use function sets containing functions of more than one arity. This considerably complicates analysis. Fortunately its does not appear to dramatically change the properties of random trees or GP and it may be possible to continue to approximate the behaviour of real GP populations by assuming only binary functions. Or even the apparently extreme position of assuming totally random arities. The mathematics of such random trees has also been analysed.
The theory in Section 2 does not deal particularly with mechanisms and so is simple. It can be applied to non-GP search techniques and other types of search operator and gives a simple approximate prediction of average evolution of program shape. By making the assumption of linear increase in depth, Section 2 gives us a domain independent prediction of sub-quadratic bloat which can be used to give quantitative predictions in a given problem.
This is not to say the others are wrong. Far from it. (We and others have measured the growth of introns). Section 2 gives another way of thinking about evolution of trees.