W. B. Langdon
Genetic Programming and Data Structures, Kluwer, 1998. W.B.Langdon@cs.ucl.ac.uk http://www.cs.ucl.ac.uk/staff/W.Langdon
3.7, 3.8 Come back to as needed
Chapter 8, Price's and Fishers' theorem. GP deception.
Glossary (B),
Scheduling preventive maintenance (C),
Implementation,
caching and speedups,
bench marks,
ftp addresses (D)
Algorithms + Data Structures  =  Programs 
Genetic Algorithms + Data Structures  =  Evolution Programs 
Genetic Programming + Data Structures  =  Automatic Programming! 
``To build [or evolve] a successful program, appropriate data structures should be used together with appropriate algorithms (these correspond to genetic operators used for transforming individual chromosomes [or programs])'',
[Michalewicz1994, page xi].
One memory cell M0, function SETM0 and terminal M0 (reads M0) [Koza1992]
What to do if M0 is not set?
32 bit integers. Strongly Typed Genetic Programming (STGP)? [Montana1995]
and better in at least one dimension then it dominates
As but ,
individuals are preferred if there are few others that dominate them.
(user to define which?)
Generated at random (but don't pop empty stack)
different proportion of operations in each sequence.
memory reset before each sequence
Test Sequence  makenull  top  pop  push  empty  Totals 
1  2  2  14  14  8  40 
2  14  3  2  13  8  40 
3  6  2  7  12  13  40 
4  3  18  5  8  6  40 
Totals  25  25  28  47  35  160 
Stack length  makenull  top  pop  push  empty  Totals 
undefined  4  4  
0  11  27  15  53  
1  5  6  14  15  9  49 
2  5  7  9  3  5  29 
3  6  3  2  2  13  
4  6  2  4  12  
510  0  
Totals  25  25  28  47  35  160 
Seq  Values pushed  No.  
1  658  544  923  508  709  560  816  810  149  179  328  1  490  451  14 
2 
23  365  814  464  885  702  123  248  284  828  177  635  588  13  
3 
557  113  942  918  233  616  223  95  238  634  262  590  12  
4 
217  539  496  377  848  239  233  331  8 
Pseudo Code Definition of the Five Queue Operations  
[Aho et al.1987]  
Operation  Code  Comment  
addone (i)  addone  :=  (i + 1) % maxlength;  cyclic increment  
makenull  head  :=  0;  initialise queue  
tail  :=  maxlength  1;  
empty  empty  :=  (addone (tail) = head);  is queue empty?  
front  front  :=  queue[ head ];  front of queue  
enqueue( x)  tail  :=  addone (tail);  add x to queue  
queue[ tail ]  :=  x;  
dequeue  dequeue  :=  queue[ head ];  return front  
head  :=  addone (head);  and remove it 
Dequeue Read from head cell. Then increase head by one.
(provide default value, zero)
The six trees fall into categories:
Using categories primitives restricted to particular trees:
Memory usage penalty above 12 cells.
Queue length  makenull  front  dequeue  enqueue  empty  Totals 
undefined  5  5  
0  10  27  16  53  
1  4  9  14  18  7  52 
2  4  12  11  8  7  42 
3  9  6  7  5  27  
4  4  6  11  21  
5  4  11  9  3  27  
6  8  9  11  5  33  
7  10  11  9  3  33  
8  3  9  4  16  
9  5  4  2  11  
Totals  23  64  81  104  48  320 
enqueue arguments  
<  10  9  8  7  6  5  4  3  2  1  0  1  2  3  4  5  6  7  8  9  10  >  
No.  9  1  3  1  2  2  1  1  4  3  18  5  8  8  5  6  3  2  2  20  
Total  104 
Objective  To evolve a firstin firstout queue  
Architecture  Five separate trees, plus single ADF  
Primitives 


Fitness Case  4 test sequences, of 40 tests and one of 160 (slide )  
No program aborts  
Fitness Scaling  Each operation scored independently using Pareto comparison (1 per test passed), Memory usage above minimum (12 cells) penalized  
Selection  Pareto tournament of 4  
Hits  Test passed  
Wrapper 


Parameters  Population = 10,000, G=100, program size , deme  
Success Predicate  320 hits, i.e. all tests passed 
Correct but requires infinite memory.
Very high ``effort''.
``easy'' 5 success in 11 (18) runs.
6 in 57 runs
3 nongeneral overfit test cases.
Definitions of the Ten List Operations  
Makenull  Make the list an empty list and return position given by End. 
Retrieve( p) 
Return the element at position p. 
Insert( x, p) 
Insert x at position p, moving elements at p and following positions to the next higher position. 
Delete( p) 
Delete the element at position p, moving elements at p+1 and following positions to the previous lower position. 
End 
Return the position following the last element of the list. 
First 
Return the position of the first position. If the list is empty, return the position given by End. 
Next( p) 
Return the position following position p. 
Previous( p) 
Return the position before position p. 
Locate( x) 
Return the position of the first element containing value x. If x is not in the list at all then return the position given by End. 
Printlist 
Print the elements in their order of occurrence. 
Retrieve, Next, Previous, Adf1, Ins_adf, Del_adf, Loc_adf and Prt_adf must contain arg1 terminal and Adf1 must contain FUNC (cf. slide )
(reduces runtime but encourage premature convergence?)
Assume later errors due to other code
(details given in [Langdon1995]).
Objective  To evolve a list  
Architecture  Ten separate trees, plus five ADFs  
Primitives 
 
All trees may contain +, , 0, 1 and max 
Fitness Case  538 trees run in 21 sequences. 167 consistency tests. Tangent test data distribution (F = 15). 
Fitness Scaling  Each tree scored independently using Pareto comparison, memory usage above minimum (12 cells) and CPU usage above 120 per test run are Pareto fitness penalties. 
Selection 
Elitist Pareto Tournament group 4, Niche population sample size 81. 
Hits  Number of consistency checks passed 
Wrapper  Insert, Delete and Printlist result ignored, otherwise no wrapper. 
Parameters  Population = 10,000, G=100, program size <501, Max initial tree size 50, 90% directed crossover. 
Success Predicate  167 hits, i.e. all tests passed 
Primitive  Purpose 
max  constant 10 ( max list size). 
PROG2( t, u)  evaluate t; return u 
arg1  argument of current operation or ADF, but: 
ARG1, ARG2  arguments of Insert, Delete, Locate or Printlist. 
aux1  an auxiliary variable (i.e. in addition to indexed memory). 
Set_Aux1( x)  aux1 = x; return aux1 
forwhile( s, e, l)  for i0 = s; i0 e; i0++ 
if timeout (32) exit loop  
if l returns zero exit loop  
return i0  
FUNC  call private ADF of operation which called Adf1. 
print( d)  if room in print buffer copy d into it; return number items in it 
else evaluate d; return 0  
read( x)  if  x return store[ x] 
else return 0  
write( x, d)  if  x store[ x] = d; return original contents of store[ x] 
else evaluate d; return 0  
swap( x, y)  if  x and  y exchange contents of store[ x] and store[ y] 
if  x > l and  y store[ y] = 0  
if  x and  y > l store[ x] = 0  
return 1 
Treat as ADF  Returns value  Argu ments  Passbyreference  Directly testable  Sufficient testing  
Makenull  
Retrieve  1  
Insert  2  
Delete  1  
End  
First  
Next  1  
Previous  1  
Locate  1  
Printlist  
Adf1  1  
Ins_adf  1  
Del_adf  1  
Loc_adf  1  
Prt_adf  1 
Chapter 8 discusses why primitives may become extinct.
GP to evolve a recogniser for a Dyck language.
(, ), [, ], {, }, `, '.
Evolve to operate like pop, push and top?
All can be used by main tree
adf2 (top?) can be used by adf0.
ARG1 (integer representing current bracket), ifopen, ifmatch
aux1, Set_Aux1
Objective  Find a program that classifies sequences of four types of bracket ( ( (represented as 5), ) (71), [ (13), ] (103), { (31), } (137), ` (43) and ' (167) ) as being correctly nested or not.  
Primitives  Common  Stack Given  Index Memory 
All trees:  ADD, SUB, PROG2, IFLTE, Ifeq, 0, 1, max, aux1  Makenull, Empty, Top, Pop, Push  read, write, inc_aux1, dec_aux1 
rpb: as all plus  ifopen, ifmatch, ARG1, Set_Aux1  adf1, adf2, adf3  
adf1: as all plus  adf3  
adf2: as all plus  arg1, arg2  
Max prog size  Initial tree limit 50  50  4 x 50 = 200 
Fitness Case  286 fixed test examples, cf. slide  
Fitness Scaling  Number of correct answers returned.  
Selection  Tournament size 4 (After first solution CPU penalty used giving a two dimensional fitness value, fitness niching used with a sample of up to 81 (9 x 9) nearest neighbours).  
Hits  Number test symbols correctly classified.  
Wrapper  Zero represents True (i.e. in language) and all other values False.  
Parameters  Pop = 10,000, G = 50, Pareto, 3 x 3 demes, CPU penalty only after first solution found, Abort on first error in sentence.  
Success predicate  Hits 1756, i.e. all answers correct. 
counting no. correctly classifies as correctly balanced or not.
Len  Positive  Negative  After Removing Duplicates  
gth  Balanced  Rand  Positive  Balanced  Rand  Score  
1  all 8  0  
2  all  4  all 60  9  18  
3  16  10  30  
4  all  32  all  24  16  27  16  172  
5  16  16  80  
6  rand  32  rand  32  32  32  32  32  576 
7  16  16  112  
8  rand  32  rand  32  32  32  32  32  768 
Totals  91  112  83  1756 
Some of the more promising runs were extended beyond 50 generations up to 140 generations without finding a solution.
Objective  Find a program that evaluates integer Reverse Polish (postfix) arithmetic expressions.  
Primitives  Common  Stack Given  Index Memory 
trees:  ADD, SUB, MUL, DIV, PROG2, 0, 1, aux1, Set_Aux1  Top, Pop, Push  read, write, inc_aux1, dec_aux1, adf1, adf2 
num: as ops plus  arg1  
adf1: as ops but  no adfs  
adf2: as ops but  no adfs and add arg1  
adf3: as ops but  no adfs and add arg1, arg2  
Max prog size  Initial tree limit 50  5 x 50 = 250  7 x 50 = 350 
Fitness Case  127 fixed test expressions, cf. slides , and .  
Fitness Scaling  Number of correct answers returned.  
Selection  Pareto tournament size 4, CPU penalty (initial threshold 50 per operation), fitness niching used with a sample of up to 81 other members of the population.  
Hits  Number of correct answers returned.  
Wrapper  Value on num ignored. No wrapper on other trees.  
Parameters  Pop = 10,000, G = 100, Pareto, no demes, CPU penalty (increased after 1^{st} solution found), abort on first wrong answer given in expression.  
Success predicate  Fitness 194, i.e. a program passes all tests. 
Example: 1+2=3 correct Increment score of num and plus
length  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  Total 
No. of cases  10  3  55  27  44  2  36  1  5  8  3  194 
Operation  No.  Max Score 
num  550  163 
plus  67  58 
minus  103  85 
times  85  64 
divide  156  127 
420  
Totals  970  497 
depth  1  2  3  4  5  6  Total 
No. of cases  387  390  149  31  12  1  970 
Actions Performed by Terminals and Functions  
Primitive  Purpose 
DIV( x, y)  if y 0 return x/ y 
else return 1  
SUBR( x, y) DIVR( x, y)  As SUB and DIV except yield y x and y/ x, i.e. operands reversed. 
max  constant 10 ( max input size). 
PROG2( t, u)  evaluate t; return u 
ARG1, arg1, arg2  arguments of current operation or ADF 
aux1  an auxiliary variable 
Set_Aux1( x)  aux1 = x; return aux1 
ifopen( x, t1, t2)  if x = 5, 13, 31 or 43 return t1 //i.e. opening symbol 
else return t2  
ifmatch( x, y, t1, t2) 
if x = 5, 13, 31 or 43 evaluate y //i.e. opening symbol 
if ( x, y) = (5,71), (13,103), (31,137) or (43,167) return t1  
else return t2 // x and y don't match  
else return t2 
Makenull  clear stack; return 0 
Empty  if stack is empty return 0; else return 1 
Top  if stack is empty return 0; else return top of stack 
Pop  if stack is empty return 0; else pop stack and return popped value 
Push( x)  Evaluate x; 
if < 99 items on stack push x; return x  
else return 0  
read( x)  if  x return store[ x] 
else return 0  
write( x, d) 
if  x store[ x] = d 
return original contents of store[ x]  
else evaluate d  
return 0 
(PADO [Teller and Veloso1995,Teller1996] does not use problem specific data structures. Better than random performance on classification problems with no obvious structure).
Advice in [Kinnear, Jr.1994] and [Koza1992] remains sound, however:
However a high variety does not indicate all is well. Phenotypic (behaviour) variation may also be useful.
The population can be compressed, e.g. using gzip. Compression to <1 bit per primitive
The key to successful human produced software is using abstraction to control the complexity of each task in hand.
While GP work to date has concentrated on functional abstraction (ADFs etc.), GP must also to take advantage of data abstraction