Appendix D (GP and Data Structures)

Implementation

GP-QUICK

For reasons of convenience of use, support and speed, our experiments were primarily coded in C++. The GP-QUICK package written by Andy Singleton [1994] was chosen as a basis because:

It was written in C++.
It was freely available, for non-commercial use.
It was stable.
There was some limited bug fixing support.
There were early problems with the alternative C++ GP package, GPCPlus.
It implements (via the FASTEVAL macro) linear prefix jump tables ([Keith and Martin,1994, page 300] describes these as the best of the options they considered).
GP-QUICK stores GP individuals in fixed length arrays. This is believed to considerably simplify the dynamic memory allocation/deallocation problem and avoid fragmenting the heap; however it means wasting some memory where individuals are less than the maximum length.

Some disadvantages to GP-QUICK have been encountered. For example, GP-QUICK accesses the population at random. This can cause severe performance problems if part of the population is swapped out of main memory.

Coding Changes to GP-QUICK-2.1

Over the course of this work numerous changes have been made to GP-QUICK. The following list contains some of the more important:

Support for multiple trees within one individual.
Support for Automatically Defined Functions (ADFs).
Reduction in space to store population to one byte per primitive.
Run parameters set by control file and/or command line.
Separate file of test cases.
Check pointing (so long runs can be restarted after a workstation reboots).

Default Parameters

The default values for GP-QUICK parameters used are given in Table D.1. They are the same as those used by [Koza,1994, page 655] except, the following GP-QUICK parameters have also been used:

GP-QUICK implements a steady state, rather than a generational, population.
The GP-QUICK default tournament size is 4, rather than 7. Additionally selecting individuals for removal from the population is performed using tournament selection (again with a tournament size of 4).
A maximum total program length limit is used rather than a maximum program depth limit of 17. In addition to the total length restriction, some experiments impose a limit, when the initial population is created, of 50 nodes on individual trees within the program.
A single child is produced by each crossover. (If the potential offspring exceeds the maximum program size then the roles of the two parents are swapped, given both parents are of legal length this second offspring cannot be too big).

Table D.1: Default GP-QUICK Parameters

pPopSize	:	10000
pGenerateLimit	:	1000000
pPopSeed	:	0
pTestSeed	:	1
pTrace	:	0
pMaxExpr	:	250
pInitExpr	:	6
pMuteRate	:	0
pCrossSelf	:	1
pUnRestrictWt	:	70
pCrossWt	:	90

pMuteWt	:	0
pAnnealMuteWt	:	0
pAnnealCrossWt	:	0
pCopyWt	:	10
pSelectMethod	:	4
pTournSize	:	4
pGaussRegion	:	0
pRepeatEval	:	0
pKillTourn	:	4
pMaxAge	:	0
pParsimony	:	0

Network Running

All the experiments were carried out on the UCL Computer Science department's heterogeneous network of SUN workstations. The runs being independent of each other. Using the queue problem as an example, with a population of 10,000, each job occupies about 13 Megabytes of RAM within the workstation. However the elapse time of each job varies considerably depending upon the load on the workstation (most runs were done out of hours), the speed of the workstation and the details of the GP run. To perform a million fitness evaluations (each of which requires the program under test to be run 320 times) takes all day on the faster workstations and cannot be completed in a day on the slower ones.

Reusing Ancestors Fitness Information

This section briefly describes an implementation which has achieved reductions in run time by a factor of two. It can be applied to multi-tree programs whose primitives have side effects and is implemented using a conventional representation for the population. (Section 2.4.1 describes DAG based caches which can be used where there are no side-effects).

When tests are independent of each other, unless a particular test causes code which is different in a child from its parent to be run, the child's result on that test must be the same as that of its parent on the same test. Since the only difference between the child's code and the parent's is produced by crossover (or mutation), it can be readily determined if the child's result on a particular test could be different from its parent. Only those tests where the result could be different need be run. For the others, the result can be taken from the parent. The complete fitness of the child is assembled from the individual test results. NB result values can be inherited indefinitely, i.e. not only directly from the child's mother but also from its grandmother, great grandmother etc.

Our implementation considers changes only at the tree level. If crossover changes a tree which is not executed in a particular test then the new program's behaviour on that test must be identical to its parent's. I.e. its score must be identical and therefore the child's score on that test is inherited, and that test is not executed. The savings produced are dependent upon the test sequences, GP parameters and other time saving techniques (cf. Section D.6), nonetheless run time was halved in some cases. Since results for each test in the test case must be stored, as well as the overall fitness value, the technique increases execution speed at the expense of greater use of memory.

The technique could be extended to cover other cases where a child's score must be identical to that of one of its parents. For example children produced by crossovers which occur within introns, must behave identically to their parent and so have identical fitness.

Caches

Handley's [1994] combination of caches and directed acyclic graphs, requires all the primitives to be free of side effects. read, write and other primitives do have side effects, nevertheless caching can be used when parts of the program (such as trees or ADFs) are known to be free of side effects. In the case of some runs of the list problem, cache hit ratios of between 70% and 97% were obtained with caches on End (1000 words), First (200), Next (1000) and Previous (200). This reduced run time by about 50%.

Compressing the Check Point File

As mentioned in Section D.4, GP runs may take a long time. Under these circumstances a check point file can be used to restart a run from the last check point, rather than from the beginning. The check point file can also be used to extend particularly interesting runs.

The bulk of the check point file is occupied by the GP population. If written as plain text the population consumes too much disk space, particularly if more than one GP run is active at the same time. Therefore the population is written in a coded form, which occupies one byte per program primitive. As the GP run continues and the population converges, considerable reductions in disk space can be achieved by using gzip to compress the check point file. Compression ratios as high as 25 fold can occur but compression to about one bit per program primitive is common. (The high compression achieved indicates low diversity within the GP population). A balance needs to be struck between the time taken to process the check point file and the potential time saved should the check point file be used. Typically it takes about a minute to compress the check point file. Therefore check points are written about once an hour.

Benchmarks

A number of CPU and memory usage measurements were made using the MAX problem [Langdon,1997] running on a 400 MHz Digital Alphastation 500/400 with a 2MB cache and a SUN SPARCstation 2 fitted with an 80MHz CPU Weitek PowerUP CPU and floating point processor giving a performance of about 1.6 times that of a standard SPARCstation 2.

Memory

In GP-QUICK the memory occupied by the GP process is normally dominated by the space required to store the population. When run as a steady state GA there is a single population but when run as a generational GA the new population is created separately from the current one and so (in this implementation) twice as much memory is required. (With a little code re-organisation the double memory overhead could be removed).

GP-QUICK uses fixed length objects to store the population. The memory required for each GP individual is one byte times the maximum allowed program length plus a fixed overhead. The overhead is 182 bytes on the SUN and 304 bytes on the Alpha. To estimate the total memory, the number of bytes per GP individual is multiplied by the size of the population (and doubled if using a generational GA). With small populations, the memory occupied by the code can also be important.

Speed

A number of trials on both machines were made using the MAX problem. In this problem the population will be dominated by + and 0.5 primitives, as both of these require only trivial floating point calculations we expect the run time to be dominated by the time to interpret the GP programs and so Table D.2 gives an indication of the performance of GP-QUICK on the two machines. The top two rows of figures are obtained by dividing the total time by the number of individuals or primitives evaluated.

GP-QUICK supports a FASTEVAL pre-processing step which is aimed as increasing performance when GP individuals are evaluated multiple times. In the MAX problem individuals are only evaluated once so a modest saving might have been made by removing the FASTEVAL step.

An estimate of program evaluation time was made by evaluating individuals multiple times. The bottom two rows of Table D.2 were obtained by dividing the increase in total time by the increase in number of individuals and primitives evaluated. The lower two rows can be used to estimate the extra runtime caused by multiple evaluations, e.g. due to multiple fitness cases.

Due to the cache structures on the two machines individuals can be re-evaluated very quickly, leading to very fast times in the lower part of Table D.2. On problems with large numbers of (or complex) functions or terminals the cache may not perform as well and longer run times are expected.

**Table D.2:** GP-QUICK MAX Problem $\mu$ S CPU
Total time	SPARC		Alpha
per	individual	680		92
per	primitive	6	.3	0	.8
Evaluation only
	individual	37		4	.8
	primitive	0	.3	0	.042

Code

Most of the source code used in the experiments described in this book is available on an as is unsupported basis, for research and educational purposes, via anonymous ftp node ftp.cs.bham.ac.uk in subdirectories of pub/authors/W.B.Langdon/

The C++ code used in the experiments described in this book (except propriety code for the South Wales problem, cf. Appendix C) is available via ftp from sub-directory gp-code
The test cases used in Sections 7.1.4, 7.2.4 and 7.3.4 are available via ftp sub-directiory gp-code in file file GPdata_gp96.test.tar.Z
The simple GP system used in the example in Section 2.3.3 is available via ftp sub-directory gp-code/simple in file simple-gp.c
The program used to calculate the number of different trees of a given size with given terminal and function sets [Alonso and Schott,1995] used on page 16 is also available via ftp sub-directory gp-code in file ntrees.cc
C code for the demonstration four node problem and C++ code for QGAME (Appendix C) are available via ftp from sub-directory ga-code
Finally an interactive on line demonstration of the four node problem written in Java can be found on the Internet at ../../four_node.demo.html

Bill LANGDON 2001-08-19