next up previous
Next: Bibliography

W. B. Langdon

Genetic Programming + Data Structures = Automatic Programming!
bookcover
W. B. Langdon


University College, London



Genetic Programming and Data Structures, Kluwer, 1998. W.B.Langdon@cs.ucl.ac.uk http://www.cs.ucl.ac.uk/staff/W.Langdon

Background

Algorithms + Data Structures = Programs
 
Genetic Algorithms + Data Structures = Evolution Programs
 
Genetic Programming + Data Structures = Automatic Programming!
 

``To build [or evolve] a successful program, appropriate data structures should be used together with appropriate algorithms (these correspond to genetic operators used for transforming individual chromosomes [or programs])'',

[Michalewicz1994, page xi].

Aims

Scalar Memory

One memory cell M0, function SETM0 and terminal M0 (reads M0) [Koza1992]

What to do if M0 is not set?

Indexed Memory
[Teller1993,Teller1994]

Fixed number of cells

Indexed Memory

Data Objects

Multi-tree Programs

One Individual - One Program: Five Operations - Five Trees

Multi-tree Crossover

Crossover in One Tree at a time

Pareto Optimality
 

How Pareto Works

Pareto Fitness Sharing

Effect of Comparison Set on Pareto Front
Number of different non dominated fitness values in a list population (Chapter 6) with and without a comparison set (no niche sharing, not elitist, pop=10,000, no demes)

Fitness Sharing Pareto Tournament Selection

Pareto Elitism
Evolving a Stack, Chapter 4

Choice of Stack Primitives

Stack Fitness Function

Number of each the five stack operations
 
Test Sequence makenull top pop push empty Totals
1 2 2 14 14 8 40
2 14 3 2 13 8 40
3 6 2 7 12 13 40
4 3 18 5 8 6 40
Totals 25 25 28 47 35 160

Number of each the five stack operations - by depth
 
Stack length makenull top pop push empty Totals
undefined 4         4
0 11     27 15 53
1 5 6 14 15 9 49
2 5 7 9 3 5 29
3   6 3 2 2 13
4   6 2   4 12
5-10           0
Totals 25 25 28 47 35 160

Integer Values Pushed onto the Stack
 
Seq Values pushed No.
1 658 544 923 -508 709 560 816 810 149 -179 -328 1 490 -451 14

2

-23 -365 814 -464 -885 -702 123 -248 -284 828 177 635 -588   13

3

557 113 942 -918 -233 616 223 -95 238 -634 -262 590     12

4

217 539 496 -377 -848 -239 -233 331             8
47 integer values values of arg1 chosen randomly uniformly from the range -1000...999.

Tableau for Evolving a Stack
             
Objective To evolve a pushdown stack
Architecture Five separate trees
Primitives +, -, 0, 1, max, arg1, aux, inc_aux, dec_aux, read, write, write_Aux
Fitness case 4 test sequences, each of 40 tests (see slides [*] to [*])
Fitness score 1.0 for each test passed
Selection Scalar tournament of 4
Hits n/a
Wrapper
makenull result ignored
top no wrapper
pop no wrapper
push result ignored
empty result $>0 \Rightarrow$ TRUE, else FALSE
Parameters Population = 1000, Max generations = 101, program size <=250
Success Fitness >=160.0

Stack Results

Evolved Stack 1

Simplified Stack 1

Evolving a Queue, Chapter 5
Pseudo Code Definition of the Five Queue Operations 
[Aho et al.1987] 
OperationCodeComment
addone (i)addone :=(i + 1) % maxlength;cyclic increment
makenullhead :=0;initialise queue
 tail :=maxlength - 1; 
emptyempty :=(addone (tail) = head);is queue empty?
frontfront :=queue[ head ];front of queue
enqueue( x)tail :=addone (tail);add x to queue
 queue[ tail ] := x; 
dequeuedequeue :=queue[ head ];return front
 head :=addone (head);and remove it
Note the use of the function ``addone''

Queue

Circular Implementation of queues
Enqueue Increase tail by one (and wrapping round if needbe). Write into the new tail cell.

Dequeue Read from head cell. Then increase head by one.

Queue Primitives

Queue Memory
As stack but:

Queue Syntax

Good Engineering Practice
  Measures taken to ensure the ADF is ``sensible''.    

Pass by Reference
  

Queue Parameters

Queue Fitness Function

Number of each the five queue operations
 
Queue length makenull front dequeue enqueue empty Totals
undefined 5         5
0 10     27 16 53
1 4 9 14 18 7 52
2 4 12 11 8 7 42
3   9 6 7 5 27
4   4 6 11   21
5   4 11 9 3 27
6   8 9 11 5 33
7   10 11 9 3 33
8   3 9 4   16
9   5 4   2 11
Totals 23 64 81 104 48 320

Integers enqueued

 

  enqueue arguments
  < -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 >
No. 9 1 3 1   2 2 1 1 4 3 18 5 8 8 5 6 3   2 2   20
Total   104  

Tableau for Evolving a Queue
         
Objective To evolve a first-in first-out queue
Architecture Five separate trees, plus single ADF
Primitives
makenull +, -, 0, 1, max, mod, PROG2, QROG2, aux1, aux2, read, write, Set_Aux1, Set_Aux2
front +, -, 0, 1, max, mod, PROG2, QROG2, aux1, aux2, read
dequeue +, -, 0, 1, max, mod, PROG2, QROG2, aux1, aux2, read, write, Adf1, Front
enqueue +, -, 0, 1, max, mod, PROG2, QROG2, aux1, aux2, read, write, Adf1, arg1
empty +, -, 0, 1, max, mod, PROG2, QROG2, aux1, aux2, read
adf1 +, -, 0, 1, max, mod, PROG2, QROG2, arg1
Fitness Case 4 test sequences, of 40 tests and one of 160 (slide [*])
  No program aborts
Fitness Scaling Each operation scored independently using Pareto comparison (1 per test passed), Memory usage above minimum (12 cells) penalized
Selection Pareto tournament of 4
Hits Test passed
Wrapper
makenull result ignored
front no wrapper
dequeue no wrapper
enqueue result ignored
empty result $=0 \Rightarrow$ TRUE, otherwise FALSE
adf1 n/a
Parameters Population = 10,000, G=100, program size $ \le 250$, deme $= 3 \times 3$
Success Predicate 320 hits, i.e. all tests passed

Execution of ``caterpillar'' program
Execution of ``caterpillar'' program. Labels in bold indicate current values, dotted show previous values. Shaded cells hold queue. The heavy arrow indicates the general movement of the caterpillar as data items are added to the queue. As items are removed from the head of the queue it moves to the right, i.e. it acts like the tail of a caterpillar.

Evolved Queue Results

Execution of Queue program 4.
Circular queue implementation. Adf1 increases it's argument by two and arranges on overflow to use the cells it previously skipped (numbers on the arrows indicate order cells are used in). Cell zero is only used once, but other cells are re-used.  

Evolving a List, Chapter 6
Definitions of the Ten List Operations
Makenull Make the list an empty list and return position given by End.

Retrieve( p)

Return the element at position p.

Insert( x, p)

Insert x at position p, moving elements at p and following positions to the next higher position.

Delete( p)

Delete the element at position p, moving elements at p+1 and following positions to the previous lower position.

End

Return the position following the last element of the list.

First

Return the position of the first position. If the list is empty, return the position given by End.

Next( p)

Return the position following position p.

Previous( p)

Return the position before position p.

Locate( x)

Return the position of the first element containing value x. If x is not in the list at all then return the position given by End.

Printlist

Print the elements in their order of occurrence.

Iteration
 

List ADF Hierarchy

 

List Syntax Restrictions
 

List Fitness Function 1

List Fitness Function 2

Locating Errors with Fitness Testing
       

Directed Crossover
  

Objective To evolve a list
Architecture Ten separate trees, plus five ADFs
Primitives
Makenull PROG2, write, Set_Aux1, End
Retrieve arg1, read
Insert PROG2, aux1, adf1, Next, ARG1, ARG2, write
Delete PROG2, aux1, adf1, Next, ARG1, Prev
End aux1
First aux1
Next arg1
Previous arg1
Locate adf1, First, ARG1
Printlist adf1, First

Adf1

arg1, aux1, forwhile, i0, FUNC, End
Ins_adf arg1, swap
Del_adf arg1, swap, ARG1, Next
Loc_adf arg1, ARG1, read
Prt_adf arg1, read, print
  All trees may contain +, -, 0, 1 and max

Fitness Case 538 trees run in 21 sequences. 167 consistency tests. Tangent test data distribution (F = 15).
Fitness Scaling Each tree scored independently using Pareto comparison, memory usage above minimum (12 cells) and CPU usage above 120 per test run are Pareto fitness penalties.

Selection

Elitist Pareto Tournament group 4, Niche population sample size 81.
Hits Number of consistency checks passed
Wrapper Insert, Delete and Printlist result ignored, otherwise no wrapper.
Parameters Population = 10,000, G=100, program size <501, Max initial tree size 50, 90% directed crossover.
Success Predicate 167 hits, i.e. all tests passed

 
 
PrimitivePurpose
max constant 10 ($\ge$ max list size).
PROG2( t, u)evaluate t; return u
arg1argument of current operation or ADF, but:
ARG1, ARG2arguments of Insert, Delete, Locate or Printlist.
aux1an auxiliary variable (i.e. in addition to indexed memory).
Set_Aux1( x) aux1 = x; return aux1
forwhile( s, e, l) for i0 = s; i0 $\le$ e; i0++
  if timeout (32) exit loop
  if l returns zero exit loop
  return i0

FUNC

call private ADF of operation which called Adf1.

print( d)

if room in print buffer copy d into it; return number items in it
  else evaluate d; return 0
 
read( x) if | x$\vert \le l$ return store[ x]
  else return 0

write( x, d)

if | x$\vert \le l$ store[ x] = d; return original contents of store[ x]
  else evaluate d; return 0

swap( x, y)

if | x$\vert \le l$ and | y$\vert \le l$ exchange contents of store[ x] and store[ y]
  if | x| > l and | y$\vert \le l$ store[ y] = 0
  if | x$\vert \le l$ and | y| > l store[ x] = 0
  return 1

Summary of the Properties of
List Operations and ADFs
  Treat as ADF Returns value Argu- ments Pass-by-reference Directly testable Sufficient testing
Makenull $\times$ $\surd$        
Retrieve $\times$ $\surd$ 1   $\surd$ $\surd$
Insert $\times$ $\times$ 2      
Delete $\times$ $\times$ 1      
End $\surd$ $\surd$       $\surd$
First $\surd$ $\surd$       $\surd$
Next $\surd$ $\surd$ 1 $\surd$   $\surd$
Previous $\surd$ $\surd$ 1 $\surd$   $\surd$
Locate $\times$ $\surd$ 1   $\surd$  
Printlist $\times$ $\times$     $\surd$ $\surd$
Adf1 $\surd$ $\surd$ 1 $\times$  
Ins_adf $\surd$ $\surd$ 1 $\times$  
Del_adf $\surd$ $\surd$ 1 $\times$  
Loc_adf $\surd$ $\surd$ 1 $\times$  
Prt_adf $\surd$ $\surd$ 1 $\times$  

List Results

First Evolved Solution to List Problem

Evolution of the frequency of rare primitives

The nine primitives which are lost completely from the population are shown as solid lines

Chapter 8 discusses why primitives may become extinct.

Software Maintenance
  Model for maintaining evolved code:
1.
Start with the original fitness function and the population that contained the solution to the original problem
2.
Write additional fitness tests for the new functionality,
3.
Expand the existing individuals with random code for the new functionality,
4.
Evolve the expanded population with both the original and new fitness tests.

Testing Software Maintenance Model on List

Software Maintenance - Results

Problems Solved Using Data Structures

Dyck Language

GP to evolve a recogniser for a Dyck language.

What is a Dyck Language

Architecture
 

Dyck Terminals and Functions

                   
Objective Find a program that classifies sequences of four types of bracket ( ( (represented as 5), ) (71), [ (13), ] (103), { (31), } (137), ` (43) and ' (167) ) as being correctly nested or not.
Primitives Common Stack Given Index Memory
All trees: ADD, SUB, PROG2, IFLTE, Ifeq, 0, 1, max, aux1 Makenull, Empty, Top, Pop, Push read, write, inc_aux1, dec_aux1
rpb: as all plus ifopen, ifmatch, ARG1, Set_Aux1   adf1, adf2, adf3
adf1: as all plus     adf3
adf2: as all plus     arg1, arg2
Max prog size Initial tree limit 50 50 4 x 50 = 200
Fitness Case 286 fixed test examples, cf. slide [*]
Fitness Scaling Number of correct answers returned.
Selection Tournament size 4 (After first solution CPU penalty used giving a two dimensional fitness value, fitness niching used with a sample of up to 81 (9 x 9) nearest neighbours).
Hits Number test symbols correctly classified.
Wrapper Zero represents True (i.e. in language) and all other values False.
Parameters Pop = 10,000, G = 50, Pareto, 3 x 3 demes, CPU penalty only after first solution found, Abort on first error in sentence.
Success predicate Hits $\ge$ 1756, i.e. all answers correct.
 

Fitness Function

Dyck Test Sentences
 
Len- Positive Negative After Removing Duplicates
gth     Balanced Rand Positive Balanced Rand Score
1     all 8       0
2 all 4 all 60     9 18
3         16     10 30
4 all 32 all 24 16 27 16   172
5         16     16 80
6 rand 32 rand 32 32 32 32 32 576
7         16     16 112
8 rand 32 rand 32 32 32 32 32 768
Totals 91 112 83 1756

Dyck Results

Evolving a 4 Function Integer Calculator

Objective Find a program that evaluates integer Reverse Polish (postfix) arithmetic expressions.
Primitives Common Stack Given Index Memory
$+ - \times /$ trees: ADD, SUB, MUL, DIV, PROG2, 0, 1, aux1, Set_Aux1 Top, Pop, Push read, write, inc_aux1, dec_aux1, adf1, adf2
num: as ops plus arg1    
adf1: as ops but     no adfs
adf2: as ops but     no adfs and add arg1
adf3: as ops but     no adfs and add arg1, arg2
Max prog size Initial tree limit 50 5 x 50 = 250 7 x 50 = 350
Fitness Case 127 fixed test expressions, cf. slides [*], [*] and [*].
Fitness Scaling Number of correct answers returned.
Selection Pareto tournament size 4, CPU penalty (initial threshold 50 per operation), fitness niching used with a sample of up to 81 other members of the population.
Hits Number of correct answers returned.
Wrapper Value on num ignored. No wrapper on other trees.
Parameters Pop = 10,000, G = 100, Pareto, no demes, CPU penalty (increased after 1st solution found), abort on first wrong answer given in expression.
Success predicate Fitness $\ge$ 194, i.e. a program passes all tests.

Calculator Fitness Function

Length of reverse polish expressions
 
length 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total
No. of cases     10 3 55 27 44 2 36 1 5   8   3 194

Calculator Test Case
  Number of times each tree occurs in reverse polish expression (RPN) test case and the score it has when the whole test case is passed.
Operation No. Max Score
num 550 163
plus 67 58
minus 103 85
times 85 64
divide 156 127
  420  
Totals 970 497

Calculator Test Case
 Number of symbols (i.e. operators or numbers) used in the RPN test case for each level of expression nesting. (Depth of nesting calculated after the symbol has been processed).
depth 1 2 3 4 5 6 Total
No. of cases 387 390 149 31 12 1 970

Calculator Results

Actions Performed by Terminals and Functions
Primitive Purpose
DIV( x, y) if y $\ne$ 0 return x/ y
  else return 1
SUBR( x, y) DIVR( x, y) As SUB and DIV except yield y- x and y/ x, i.e. operands reversed.
max constant 10 ($\ge$ max input size).
PROG2( t, u) evaluate t; return u
ARG1, arg1, arg2 arguments of current operation or ADF
aux1 an auxiliary variable
Set_Aux1( x) aux1 = x; return aux1
ifopen( x, t1, t2) if x = 5, 13, 31 or 43 return t1 //i.e. opening symbol
  else return t2

ifmatch( x, y, t1, t2)

if x = 5, 13, 31 or 43 evaluate y //i.e. opening symbol
  if ( x, y) = (5,71), (13,103), (31,137) or (43,167) return t1
  else return t2 // x and y don't match
  else return t2

Actions Performed by
Terminals and Functions 2 (continued)
 
Makenull clear stack; return 0
Empty if stack is empty return 0; else return 1
Top if stack is empty return 0; else return top of stack
Pop if stack is empty return 0; else pop stack and return popped value
Push( x) Evaluate x;
  if < 99 items on stack push x; return x
  else return 0
read( x) if | x$\vert \le l$ return store[ x]
  else return 0

write( x, d)

if | x$\vert \le l$ store[ x] = d
  return original contents of store[ x]
  else evaluate d
  return 0

Chapter 7 shows GP can

Recommendations

Advice in [Kinnear, Jr.1994] and [Koza1992] remains sound, however:

1.
GP populations should be closely studied as they evolve:
(a)
Frequency of primitives. Recognise when a primitive becomes extinct
(b)
Population variety.

However a high variety does not indicate all is well. Phenotypic (behaviour) variation may also be useful.

2.
Potential ways to encourage population diversity:
(a)
Removal of the reproduction operator.
(b)
Addition of one or more mutation operators.  
(c)
Smaller tournament sizes???
(d)
Splitting large populations into semi-isolated demes.  
(e)
Using fitness sharing to encourage many fitness niches.  
3.
Fitness caches  

4.
Where GP run time is long, periodically save the current state of the run. Should the system crash; the run can be restarted.

The population can be compressed, e.g. using gzip. Compression to <1 bit per primitive

Convergence of phenotype

Value returned by the ``best'' program in the population. First of 50 GP runs of the sextic polynomial problem [Langdon et al.1999, Figure 8.5].

Conclusions

The key to successful human produced software is using abstraction to control the complexity of each task in hand.

While GP work to date has concentrated on functional abstraction (ADFs etc.), GP must also to take advantage of data abstraction

 

 
next up previous
Next: Bibliography
Bill Langdon
2000-08-04