next up previous
Next: Solution of the Up:

Why Ants are Previous: The Artificial Ant


Size of Program and Solution Space

 

The number of different programs of a specific length is given by the size of the terminal set and the numbers of different functions in the function set of each arity (branching factor). To create a tree of a specific length a corresponding number of functions of each branching factor and number of leafs must be used. Where there are more than one branching factor available in the function set, there may be multiple combinations of function arity which give rise to a tree of the required size. In general there are multiple ways of arranging the branches and leafs. Each way gives rise to a distinct tree shape. Finally, where there are more than one terminal or more than one function of a given arity, there are multiple programs of the same shape. The number of different programs in the ant problem is plotted against their lengths in Figure 1 (and is tabulated in the ``Total'' row at the bottom of Table 2). As expected the number of programs grows approximately exponentially with the length of the programs. (The program used to calculate the number of programs is available via anonymous ftp from ftp.cs.bham.ac.uk in pub/authors/W.B.Langdon/gp-code/ntrees.cc).

For the shorter programs it is feasible to explore the program space exhaustively. Table 2 summarises the programs space up to programs of length 14. Table 2 shows the program space is highly asymmetric with almost all programs having very low scores and the proportion with higher scores falling rapidly (but not monotonically) to a low point near 72. Above 72 it rises slightly. The modal score is zero, the median is one and the mean rises with length from 1 to 2.7 while the standard deviation remains near 4 (cf. Figure 7). There is some dependence upon program length and, as expected, programs must be above a minimum size to reach modest scores. However above the minimum size the number of programs with a given score rises rapidly, being a roughly constant proportion of the total number of programs. There are an unexpectedly high number of solutions (albeit a tiny fraction of the total) and their number similarly grows with program size.

For longer programs exhaustive search is not feasible and instead we sampled the program space randomly in a series of Monte Carlo trials for a number of program sizes. For each such size between 10,000,000 random programs where generated and tested. The random programs where chosen uniformly from the set of possible programs of the specific length using the bijective random tree creation algorithm described in [Alonso and Schott1995, Chapter 4,].

Where there are multiple different combinations of function arity which give rise to trees of the required size, one was chosen at random in proportion to the number of trees it contains. Each tree was converted to a program by labelling each of its nodes with a function or terminal of the correct arity chosen uniformly at random from those that match in the function or terminal set. In this way we ensure every program of the specified length has the same chance of being chosen.

 
Figure 1:   Number of programs of a specific length (note log log scale)

Number of Trees and Distribution of Fitness

Length
Score 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 2 0 12 16 136 423 2262 10452 49088 252803 1227679 6443754 32595908 171997308
1 0 0 2 4 18 150 449 3058 13806 77269 380979 2070276 10954364 58002558
2 0 0 0 1 3 25 112 909 3429 21902 123174 646831 3587432 19987018
3 1 0 2 6 31 96 530 2779 11736 63996 318817 1656409 8501211 45339974
4 0 0 0 0 3 14 72 527 2526 16250 86999 501521 2820383 15927690
5 0 0 0 0 2 10 58 417 1844 13047 67895 390963 2213475 12466189
6 0 0 0 0 1 8 25 177 1155 6826 33174 216479 1248818 6766377
7 0 0 0 0 2 13 35 266 1601 8076 39428 240187 1324912 6872615
8 0 0 0 0 3 10 68 412 1818 10785 56857 303276 1580134 8846059
9 0 0 0 0 0 0 2 28 183 1392 8218 57485 348331 2053040
10 0 0 0 0 1 2 18 76 461 2758 12465 75079 406998 2276683
11 0 0 2 0 16 51 297 907 5876 27403 120960 659392 3245735 16642082
12 0 0 0 0 0 0 3 52 190 1326 8296 45293 258390 1525769
13 0 0 0 0 0 0 4 40 154 1203 8011 44681 266859 1548594
14 0 0 0 0 0 0 3 24 105 770 5437 27113 169161 1041738
15 0 0 0 0 0 0 9 75 150 1313 11513 41711 226363 1528861
16 0 0 0 0 0 0 0 8 34 350 2584 15053 104018 654943
17 0 0 0 0 0 2 4 35 167 1108 4962 33175 197400 1078896
18 0 0 0 0 0 0 0 2 13 190 1764 11192 83119 570147
19 0 0 0 0 0 0 0 10 66 593 3028 18180 101133 660977
20 0 0 0 0 0 2 4 20 105 763 3985 25601 179522 938185
21 0 0 0 0 0 0 1 13 56 448 2501 16089 107227 581413
22 0 0 0 0 0 0 0 16 71 639 2825 21205 129354 692285
23 0 0 0 0 0 0 0 20 47 489 2372 15321 83764 509240
24 0 0 0 0 0 4 12 47 293 1723 6688 39676 194484 1048249
25 0 0 0 0 0 0 0 1 36 315 1586 10698 65746 359170
26 0 0 0 0 0 0 0 2 23 240 1785 11356 76602 379975
27 0 0 0 0 0 0 0 7 16 222 1262 8820 54491 306671
28 0 0 0 0 0 0 0 2 18 153 1049 7208 51024 258757
29 0 0 0 0 0 0 0 0 11 118 605 4705 30333 184386
30 0 0 0 0 0 0 0 3 23 203 922 6483 40544 215169
31 0 0 0 0 0 0 0 0 12 120 624 5167 37145 191740
32 0 0 0 0 0 0 0 0 2 48 387 2859 21765 129411
33 0 0 0 0 0 0 0 5 13 132 898 5315 31898 158423
34 0 0 0 0 0 0 0 2 12 99 519 3648 23694 125950
35 0 0 0 0 0 0 0 1 5 50 298 2490 20548 105190
36 0 0 0 0 0 0 0 5 12 106 349 2905 16828 98260
37 0 0 0 0 0 0 1 0 18 104 595 3189 20800 99885
38 0 0 0 0 0 0 0 2 21 183 637 4054 17473 106100
39 0 0 0 0 0 0 0 0 6 79 226 1782 8274 54753
40 0 0 0 0 0 0 0 0 8 79 267 1888 9477 56608
41 0 0 0 0 0 0 0 0 6 57 245 1682 8812 49873
42 0 0 0 0 0 0 0 5 11 92 263 1821 9083 51056
43 0 0 0 0 0 0 0 6 8 78 190 1688 7681 47354
44 0 0 0 0 0 0 0 0 0 11 109 808 4988 27119
45 0 0 0 0 0 0 0 0 0 11 84 756 3929 24649
46 0 0 0 0 0 0 0 0 6 53 184 1165 5234 30441
47 0 0 0 0 0 0 0 1 5 42 133 1068 6063 30738
48 0 0 0 0 0 0 0 0 1 14 62 532 2978 16510
49 0 0 0 0 0 0 0 0 0 4 30 293 1917 11133
50 0 0 0 0 0 0 0 0 4 41 142 970 3467 20849
51 0 0 0 0 0 0 0 0 5 38 122 748 2731 16387
52 0 0 0 0 0 0 0 0 0 2 10 160 774 6193
53 0 0 0 0 0 0 0 0 0 13 22 307 1287 8526
54 0 0 0 0 0 0 0 0 0 0 1 26 128 1568
55 0 0 0 0 0 0 0 0 0 3 6 105 340 3437
56 0 0 0 0 0 0 0 0 0 0 10 73 289 2082
57 0 0 0 0 0 0 0 0 0 0 6 58 617 2212
58 0 0 0 0 0 0 0 0 0 0 3 24 146 949
59 0 0 0 0 0 0 0 0 0 0 0 0 20 316
60 0 0 0 0 0 0 0 0 0 0 8 46 229 1790
61 0 0 0 0 0 0 0 0 0 0 4 30 223 1435
62 0 0 0 0 0 0 0 0 0 0 0 2 29 1113
63 0 0 0 0 0 0 0 0 0 0 5 85 285 4538
64 0 0 0 0 0 0 0 0 0 0 0 1 23 337
65 0 0 0 0 0 0 0 0 0 0 0 0 53 1610
66 0 0 0 0 0 0 0 0 0 0 0 0 3 66
67 0 0 0 0 0 0 0 0 0 0 0 15 31 435
68 0 0 0 0 0 0 0 0 0 0 0 0 3 90
69 0 0 0 0 0 0 0 0 0 0 0 0 17 2394
70 0 0 0 0 0 0 0 0 0 0 0 0 0 1063
71 0 0 0 0 0 0 0 0 0 0 0 0 3 2344
72 0 0 0 0 0 0 0 0 0 0 0 0 1 18
73 0 0 0 0 0 0 0 0 0 0 0 0 7 525
74 0 0 0 0 0 0 0 0 0 0 6 42 119 868
75 0 0 0 0 0 0 0 0 0 0 0 0 0 146
76 0 0 0 0 0 0 0 0 0 0 0 0 14 174
77 0 0 0 0 0 0 0 0 0 0 4 26 113 733
78 0 0 0 0 0 0 0 0 0 0 6 34 158 991
79 0 0 0 0 0 0 0 0 0 0 4 16 137 755
80 0 0 0 0 0 0 0 0 0 0 12 104 499 3530
81 0 0 0 0 0 0 0 0 0 0 3 64 157 2126
82 0 0 0 0 0 0 0 0 0 0 2 10 60 363
83 0 0 0 0 0 0 0 0 0 0 0 60 76 1367
84 0 0 0 0 0 0 0 0 0 0 21 188 747 5559
85 0 0 0 0 0 0 0 0 0 0 3 223 459 5734
86 0 0 0 0 0 0 0 0 0 0 0 110 173 3103
87 0 0 0 0 0 0 0 0 0 0 27 194 563 3420
88 0 0 0 0 0 0 0 0 0 0 57 399 1188 6951
89 0 0 0 0 0 0 0 0 0 0 12 48 470 2676
Total 3 0 18 27 216 810 3969 20412 95256 516132 2554416 13712490 71521461 382794984
Mean 1 0 1.7 0.9 1.7 1.9 2.0 2.06903 2.24659 2.42444 2.44969 2.59006 2.70357 2.76907
SD 3.7 0 4.4 3.8 4.4 4.4 4.4 4.45867 4.57353 4.79021 4.76974 4.94338 4.98586 5.04057
Max 3 0 11 3 11 24 37 47 51 55 89 89 89 89

 
Figure 2:   Proportion of programs of a given length by their fitness. Values for lengths 15 and above are based on Monte Carlo sampling.

 
Figure 3:   Proportion of programs of a given length which are solutions. Error bars indicate standard error on Monte Carlo estimates.

Figure 2 shows that the proportion of programs with a given score is approximately constant for a wide range of program lengths. Since the total number of programs rises rapidly, this means the number of programs with a given score also rises rapidly with length. This confirms assumptions in [Langdon and Poli1997a].

With any Monte Carlo technique there will be some stochastic error in the estimates. In the case of rare events (such as finding a solution to the ant problem) this could be large. The stochastic error was kept reasonable by using a large number of trials so a modest number of solutions were found at each length (between 9 and 101 and on average 39).

An estimate of the stochastic error is plotted in Figure 3 using error bars.



next up previous
Next: Solution of the Up:

Why Ants are Previous: The Artificial Ant




William B Langdon
Tue Feb 3 19:53:29 GMT 1998