This directory contains papers that have been announced on the GP mailing list
but are not available at the GP ftp site at ftp.io.com.  Please look at
the ftp.io.com mirror directory to find any papers that you do not find
here.

Adil Qureshi Aug '95

icga85.txt
==========

 Cramer, Nichael Lynn: "A Representation for the Adaptive Generation of
Simple Sequential Programs", Proceedings, International Conference on
Genetic Algorithms and their Applications, July 1985[CMU], pp183-187.
		  
ABSTRACT
 
 A system for the adaptive generation sequential computer functions is
 described.  A small operator set is defined and encoded in JB, a fixed
 length representation.  After briefly examining the limitations encountered
 in this approach we explore TB, a tree-like representation of the code.
 These representations share the feature that they can be used to represent
 well-formed, useful computer programs while still being amenable to
 suitably defined genetic operators. As a working example, the system is
 used to produce two-input, single-output multiplication functions that are
 concise and well-defined. Future work, dealing with extensions to more
 complicated functions and generalizations of the techniques, is also
 discussed.

94-006.ps.gz
============

Dynamic Training Subset Selection for Supervised Learning in Genetic Programming

Chris Gathercole, Peter Ross
Department of Artificial In telligence
University of Edinburgh 80 South Bridge
Edinburgh EH1 1HN U.K.
chrisg,peterg@aisb.ed.ac.uk

February 25, 1994

Abstract

When using the Genetic Programming (GP) Algorithm on a difficult problem with
a large set of training data, a large population size is needed and a very
large number of function-tree evaluations must be carried out.
This paper describes some efforts made to reduce the number of such evaluations
by concentrating on selecting a small sub-set of the training data set on which
to actually carry out the GP algorithm. Three subset selection methods
described in the paper are: Dynamic Subset Selection (DSS), using the current
GP run to select `difficult' and/or previously , Historical Subset Selection
(HSS), selecting `diffcult' cases from previous GP runs, Random Subset
Selection (RSS) , selecting a subset of cases at random for each generation.
Various runs have shown that GP+DSS can produce better results
in less than 20\045 of the time taken by GP.  GP+HSS can nearly match the
results of GP , and, perhaps surprisingly , GP+RSS can occasionally
approach the results of GP.  GP+DSS produced better, more general results than
those reported in a paper for a variety of Neural Net works when used on
a substantial problem, known as the Thyroid problem.

94-007.ps.gz
============

Applications of genetic algorithms

Peter Ross and Dave Corne
Department of AI, University of Edinburgh
80 South Bridge, Edinburgh EH1 1HN
peter,dave@aisb.ed.ac.uk

Abstract

Genetic algorithms have been applied to a very wide range of practical
problems, often with valuable results.  This paper surveys just a few
examples, to illustrate the diversity of approaches and to point to some
of the considerations that have proved important in making applications
successful.  Because GAs provide a fairly comprehensible way to address
a wide range of difficult engineering and optimisation problems producing
good if not optimal results, it seems that the technology is finding its
way into real-world use much more easily than, say, expert systems did.

University of Edinburgh,
Department of
Artificial Intelligence,
Genetic Algorithms Research Group.

Paper No. 94-007
Submitted for a special issue of the AISB
Quarterly on Evolutionary Computation,
edited by Terry Fogarty

CS-TR-95-1542.ps.gz
===================

John Koza's report on parallelising GP

LearnModels.ps.gz
=================

Learning Mental Models

Astro Teller
Dept of Computer Science
Carnegie Mellon Univeristy
Pittsburgh,
PA 15213
astro@cs.cmu.edu

MentalModels.ps.gz
==================

Chapter 9 of AIGP

PADO-Tech-Report.ps.gz
======================

PADO: Learning Tree Structured Algorithms for Orchestration
into an Object System
Astro Teller and Manuela Veloso
February 10, 1995
School of Computer Science Carnegie
Mellon University Pittsburgh, P A 15213

Abstract

Most artificial intelligence systems work on simple problems and artificial
because they rely on the accurate sensing of the task world. Object recognition
is a crucial part of the sensing challenge and machine learning
stands in a position to catapult object in to real world domains. Given that,
to date, machine learning has not delivered general object recognition, we
propose a different point of attack: the learning architectures
We have developed a method for directly learning and combining algorithms in a
new way that imposes little burden on or bias from the humans involved. This
learning architecture, PADO, and the new results it brings to the problem of
natural image object recognition is the focus of this report.

PADO.ps.gz
==========

PADO: A New Learning Architecture for Object Recognition
Astro Teller and Manuela Veloso
Carnegie Mellon University

Abstract

Most artificial intelligence systems to day work on simple problems and
artificial domains because they rely on the accurate sensing of the
task world. Object recognition is a crucial part of the sensing challenge
and machine learning stands in a position to catapult object recognition into
real world domains. Given that, to date, machine learning has not delivered
general object recognition, we propose a different point of the learning
architectures themselves.  We have developed a method for directly learning
and combining algorithms in a new way that imposes little burden on or bias
from the humans involved. This learning architecture, PADO, and the new
results it brings to the problem of natural image object recognition
is the focus of this chapter.

Turing.ps.gz
============

Turing Completeness in the Language of Genetic Programming with Indexed Memory

Astro Teller
Carnegie Mellon University

aisb.ps.gz
==========

Building New Spatial Interaction Models Using Genetic Programming

S. Openshaw and I. Turton,
School of Geography,
University of Leeds,
Leeds,
UK

email: stan@geog.leeds.ac.uk, ian@geog.leeds.ac.uk

Abstract

Building computer models of human geographic systems is difficult because
of the poverty of applicable theory that can be used to specify well formed
and soundly based mathematicalmodels.  As a result there has been little
progress in this area since the late 1960's when mathematical model building
procedures based on entropy-maximising methods were first applied to spatial
interaction data. In the mid-1980's, attention was focused on the use of
genetic algorithms to explore the universe of different models that could be
built from the available symbolic pieces; that is there is a very very large
number of permutations of the model building blocks; viz. a mix of unary and
binary operators, unknown parameters and observed variables, and rules of
syntax for well formed equations. The so-called Automated Modeling System
(AMS) worked but never fulfilled its expectations. The paper revisits
the problem but re-casts the AMS approach into a genetic programming framework.
Some preliminary results based on Cray-YMP runs are reported.

cook94a.ps.gz
=============

Journal of Artificial Intelligence
Research 1 (1994) 231-255 Submitted 12/93; published 2/94

Substructure Discovery Using Minimum Description Length and Background 
Knowledge
Diane J. Cook cook@cse.uta.edu
Lawrence B. Holder holder@cse.uta.edu
Department of Computer Science Engineering
Box 19015 University of Texas at Arlington
Arlington, TX 76019 USA

Abstract

The ability to identify interesting and repetitive substructures
is an essential component to discovering knowledge in structural
data. We describe a new version of our Subdue substructure discovery
system based on the minimum description length principle.
The Subdue system discovers substructures that compress the original data
and represent structural concepts in the data. By replacing previously
discovered substructures in the data, multiple passes of Subdue produce a
hierarchical description of the structural regularities in the data.  Subdue
uses a computationally-bounded inexact graph match that identifies similar,
but not identical, instances of a substructure and finds an approximate
measure of closeness of two substructures when under computational constraints.
In addition to the minimum description length principle, other background
knowledge can be used by Subdue to guide the search towards more appropriate
substructures. Experiments in a variety of domains demonstrate Subdue's
ability to find substructures capable of compressing the original data and to
discover structural concepts important to the domain.

dret_postscript.tar.gz
======================

Artificial-Life Simulators and Their Applications

Howard Gutowitz 
Ecole Supérieure de Physique et de Chimie Industrielles
Laboratoire d'Electronique 
10 rue Vauquelin 
75005 Paris, France 

and 

The Santa Fe Institute 
1399 Hyde Park Road
Santa Fe, NM 87501
hag@santafe.edu 

HTML: http://alife.santafe.edu/alife/topics/simulators/dret/dret.html

Abstract:

Artificial Life (Alife) is a rapidly growing field of scientific research
linking biology and computer science. It seeks to understand how life-like
processes can be embodied in computer programs. Advances in this area
promise to illuminate fundamental questions both in biology ("What is life?")
and in Computer Science ("How to make robust and adaptable computer programs?").

Much of the work in artificial life is directed toward building computer
simulations of artificial creatures and the artificial worlds in which they
live. This report surveys major efforts in this area, with attention to
developments likely to lead to practical applications in the short to middle
term. 

ep.ps.gz
======

Genetic Programming of Music

Jeffrerey B. Putnam
New Mexico Instute of Mining and Technology
Aug 30th '94

model-of-landscapes.ps.gz
=========================

Terry Jones
February 1, 1994

Abstract

The use of the term landscape" is increasing rapidly in the field of
evolutionary computation, yet in many cases it remains poorly,
if at all, defined. This situation has perhaps developed because every one
grasps the imagery immediately, and the questions that would be asked
of a less evocative term do not get asked.  This paper presents a model
of landscapes that is general enough to encompass most of what computer
scientists would call search, though the model is not restricted
to either the field or the viewpoint.  It is particularly relevant to
algorithms that employ some form of crossover, and hence to genetic algorithms
and other members of the evolutionary computing family.  An overview of the
consequences and properties of the model establishes a connection with more
traditional search algorithms from artificial intelligence, introduces
the notion of a crossover landscape, and argues the importance of viewing
search as navigation and structure.

Santa Fe Institute,
1660 Old Pecos Trail, Suite A.,
Santa Fe, NM 87505, USA and Department of
Computer Science, University of New Mexico,
Albuquerque NM 87131, USA.
Email: terry@san tafe.edu

ppsn-94.ps.gz
=============

Program Search with a Hierarchical Variable Length Representation:
Genetic Programming, Simulated Annealing and Hill Climbing

Una-May O`Reilly and Franz Oppacher
Santa Fe Institute,
Santa Fe, NM, 87505, USA
School of Computer Science,
Carleton University, Ottawa, CANADA

Abstract.

This paper presents a comparison of Genetic Programming(GP) with Simulated
Annealing (SA) and Stochastic Iterated Hill Clim bing (SIHC) based on a suite
of program discovery problems which have been previously tackled only with GP.
All three search algorithms employ the hierarchical variable length
representation for programs brough t into recent prominence with the GP
paradigm [8]. We feel it is not intuitively obvious that mutation-based
adaptive search can handle program discovery yet, to date, for each GP problem
we have tried, SA or SIHC also work.


ppsn92.ps.gz
============

An Experimental Perspective on Genetic Programming

Una-May O`Reilly and Franz Oppacher
Intelligent Systems Research Group,
School of Computer Science,
Carleton University, Ottawa,
Ontario, K1S 5B6,
CANADA

Abstract

Genetic Programming (GP) has recently been introduced by John R. Koza [5] as
a method for genetically breeding populations of computer programs to solve
problems.  We believe GP to constitute a significant extension of the Genetic
Algorithm (GA) research paradigm primarily because it generalizes the genetic
search techniques: instead of looking for a solution to a specific instance of
a problem, GP attempts to evolvea program capable of computing the solutions
for any instance of the problem.  We have implemented a genetic programming
environment, GP*, that is capable of duplicating Koza`s experiments.
In this paper we describe a specific GP experimenton the evolution of programs
to sort vectors, and discuss the issues that must be addressed in any
application of GP: the design of fitness functions and test suites, and the
selection of program terminals and functions. Our observations point to
several previously unnoticed shortcomings of the GP approach.  We hypothesize
that these shortcomings are due to the fact that GP only uses a hierarchical
representation but does not construct its solutions in an explicitly
hierarchical manner.

oreilly
=======
Una-May O'Reilly PhD thesis (An Analysis of Genetic Programming)


rosca
=====

Directory containing GP papers by J. P. Rosca ("Learning by Adapting
Representations in GP etc.)

terry
=====

Papers by Terry Jones, including PhD thesis. (Model of Landscapes etc.)

GPdata_icga-95.ps
=================

Evolving Data Structures with Genetic Programming

W. B. Langdon	W.B.Langdon@cs.bham.ac.uk
Computer Science Dept.
University College London

It is established good software engineering practice to ensure that
programs use memory via abstract data structures such as stacks,
queues and lists.  These provide an interface between the program and
memory, freeing the program of memory management details which are
left to the data structures to implement.  
The main result presented herein
is that GP can automatically generate stacks and 
queues.

Typically abstract data structures support multiple operations, such as
put and get. We show that GP can simultaneously evolve all the
operations of a data structure by implementing each such operation
with its own independent program tree. That is, the chromosome
consists of a fixed number of independent program trees. Moreover,
crossover only mixes genetic material of program trees that
implement the same operation. Program trees interact with each other
only via shared memory and shared ``Automatically Defined Functions''
(ADFs).

ADFs, ``pass by reference'' when
calling them, Pareto selection, ``good software engineering practice''
and partitioning the genetic population into ``demes''
where also investigated whilst evolving the queue in order to improve the
GP solutions. 

Code is available from this site

WBL_aaai-pppGP.ps
=================       

Pareto Optimality, Population Partitioning, Price's theorem and GP

W. B. Langdon	W.B.Langdon@cs.bham.ac.uk
Computer Science Dept.
University College London


A description of a use of Pareto optimality in genetic programming is
given and an analogy with Genetic Algorithm fitness niches is drawn.
Techniques to either spread the population across many pareto optimal
fitness values or to reduce the spread are described. It is speculated
that a wide spread may not aid Genetic Programming. It is suggested
that this might give useful insight into many GPs whose fitness is
composed of several sub-objectives.

The successful use of demic populations in GP leads to speculation
that smaller evolutionary steps might aid GP in the long run.

An example is given where Price's covariance theorem helped when
designing a GP fitness function.


directed_crossover.ps
=====================
Summary of directing crossover locations in a multi-tree GP

W. B. Langdon	W.B.Langdon@cs.bham.ac.uk
Computer Science Dept.
University College London

GPlist_aigp2.ps
===============

W. B. Langdon	W.B.Langdon@cs.bham.ac.uk
Computer Science Dept.
University College London

Abstract for Advances in Genetic Programming 2

Much published work on genetic programming evolves functions without
``side-effects'' to learn patterns in test data, once produced they
can be used to make predictions regarding new data.  In contrast human
written programs often make extensive and explicit use of memory.
Indeed memory in some form is required for a programming system to be
Turing Complete.  In both normal and genetic programming considerable
benefits have been found in adopting a ``structured approach''. For
example, Koza has shown the introduction of evolvable code modules
(ADFs) can greatly reduce the effort required for GP to reach a
solution.
 
Teller has shown that the introduction of indexed memory into genetic
programming makes it Turing Complete.  This work builds on Teller's
and considers the introduction of data structures within GP.  The
expectation is that the use of data structures will prove as useful to
GP as code structuring has already been shown to be.  So far the
important result is that GP can evolve simple data structures, such as
stacks and queues. This abstract reports the successful evolution a
list data structure, compares the use of Pareto selection and demes
and describes the directed choice of crossover points.


WBL.gpgrid_gp96.ps
==================

W. B. Langdon	W.B.Langdon@cs.bham.ac.uk
Computer Science Dept.
University College London

Abstract for: Scheduling Maintenance of Electrical Power Transmission
              Networks Using Genetic Programming

The National Grid Company Plc is responsible for the maintenance of the 
high voltage electricity transmission network in England and Wales.
It must plan maintenance so as to minimize costs taking into account:
 
(1) location and size of demand,
(2) generator capacities and availabilities,
(3) electricity carrying capacity of the remainder of the network,
    that part not undergoing maintenance.
 
Previous work showed the combination of a Genetic Algorithm using an
order or permutation chromosome combined with hand coded ``Greedy''
Optimizers can readily produce an optimal schedule for a four node
test problem. Following this the same GA has been used to find low
cost schedules for the South Wales region of the UK high voltage power
network.
 
This paper describes the evolution of the best known schedule for the
base South Wales problem using Genetic Programming starting from the
hand coded heuristics used with the GA.

Submitted to GP96 late breaking papers


surveyRN76.ps
===============

W. B. Langdon	W.B.Langdon@cs.bham.ac.uk
A. Qureshi	A.Qureshi@cs.ucl.ac.uk
Computer Science Dept.
University College London

Genetic Programming -- Computers using "Natural Selection" to generate programs

RN/95/76

Abstract

Computers that ``program themselves''; science fact or fiction?
Genetic Programming uses novel optimisation techniques to ``evolve''
simple programs; mimicking the way humans construct programs by
progressively re-writing them.  Trial programs are repeatedly modified
in the search for ``better/fitter'' solutions. The underlying basis is
Genetic Algorithms (GAs).
 
Genetic Algorithms, pioneered by Holland, Goldberg and others, are
evolutionary search techniques inspired by natural selection (i.e\
survival of the fittest).  GAs work with a ``population'' of trial
solutions to a problem, frequently encoded as strings, and repeatedly
select the ``fitter'' solutions, attempting to evolve better ones. The
power of GAs is being demonstrated for an increasing range of
applications; financial, imaging, VLSI circuit layout, gas pipeline
control and production scheduling.  But one of the most intriguing
uses of GAs - driven by Koza - is automatic program generation.
 
Genetic Programming applies GAs to a ``population'' of programs -
typically encoded as tree-structures.  Trial programs are evaluated
against a ``fitness function'' and the best solutions selected for
modification and re-evaluation. This modification-evaluation cycle is
repeated until a ``correct'' program is produced.  GP has demonstrated
its potential by evolving simple programs for medical signal filters,
classifying news stories, performing optical character recognition,
and for target identification.
 
This paper surveys the exciting field of Genetic Programming.  As a
basis it reviews Genetic Algorithms and automatic program generation.
Next it introduces Genetic Programming, describing its history and
describing the technique via a worked example in C.  Then using a
taxonomy that divides GP researchs into theory/techniques and
applications, it surveys recent work from both of these perspectives.
 
Extensive bibliographies, glossaries and a resource list are included as
appendices.


WBL.ecj.price.125.ps.gz
=======================
Evolution of Genetic Programming Populations
W.B.Langdon@cs.bham.ac.uk
RN/96/125
                 We investigate in detail what happens as genetic
                 programming (GP) populations evolve. Since we shall use
                 the populations which showed GP can evolve stack data
                 structures as examples, we start in Section 1 by
                 briefly describing the stack experiment
                 \cite{Langdon:1995:GPdata}. In Section 2 we show
                 Price's Covariance and Selection Theorem can be applied
                 to Genetic Algorithms (GAs) and GP to predict changes
                 in gene frequencies. We follow the proof of the theorem
                 with experimental justification using the GP runs from
                 the stack problem. Section 3 briefly describes Fisher's
                 Fundamental Theorem of Natural Selection and shows in
                 its normal interpretation it does not apply to
                 practical GAs. An analysis of the stack populations, in
                 Section 4, explains that the difficulty of the stack
                 problem is due to the presence of ``deceptive'' high
                 scoring partial solutions in the population. These
                 cause a negative correlation between necessary
                 primitives and fitness. As Price's Theorem predicts,
                 the frequency of necessary primitives falls, eventually
                 leading to their extinction and so to the impossibility
                 of finding solutions like those that are evolved in
                 successful runs. Section 4 investigates the evolution
                 of variety in GP populations. Detailed measurements of
                 the evolution of variety in stack populations reveal
                 loss of diversity causing crossover to produce
                 offspring which are copies of their parents. Section 5
                 concludes with measurements that show in the stack
                 population crossover readily produces improvements in
                 performance initially but later no improvements at all
                 are made by crossover. Section 6 discusses the
                 importance of these results to GP in general
48 pages

bruce.thesis.ps.gz
==================

Wilker Shane Bruce: "The Application of Genetic Programming to the
Automatic Generation of Object-Oriented Programs". Phd Thesis, Dec 1995. 


langdon.ps.gz
=============

W. B. Langdon   W.B.Langdon@cs.bham.ac.uk
Computer Science Dept.
University College London

Abstract

This thesis investigates the evolution and use of abstract data types
within Genetic Programming (GP).  In genetic programming the
principles of natural evolution (fitness based selection and
recombination) acts on program code to automatically generate computer
programs.  The research in this thesis is motivated by the observation
from software engineering that data abstraction (eg via abstract data
types) is essential in programs created by human programmers. We
investigate whether abstract data types can be similarly beneficial to
the automatic production of programs using GP.


GP can automatically ``evolve'' programs which solve non-trivial
problems but few experiments have been reported where the evolved
programs explicitly manipulate memory and yet memory is an essential
component of most computer programs.  So far work on evolving programs
that explicitly use memory has principally used either problem
specific memory models or a simple indexed memory model consisting of
a single global shared array.  Whilst the latter is potentially
sufficient to allow any computation to evolve, it is unstructured and
allows complex interaction between parts of programs which weaken
their modularity.  In software engineering this is addressed by
controlled use of memory using scoping rules and abstract data types,
such as stacks, queues and files.


This thesis makes five main contributions: 

(1) Proving that abstract data types (stacks, queues and lists) can be
evolved using genetic programming.

(2) Demonstrating GP can evolve general programs which recognise a
Dyck context free language, evaluate Reverse Polish expressions and GP
with an appropriate memory structure can solve the nested brackets
problem which had previously been solved using a hybrid GP.

(3) In these three cases (Dyck, expression evaluation and nested
brackets) an appropriate data structure is proved to be beneficial
compared to indexed memory.

(4) Investigations of real world electrical network maintenance
scheduling problems demonstrate that Genetic Algorithms can find low
cost viable solutions to such problems.

(5) A taxonomy of GP is presented, including a critical review of
experiments with evolving memory.  These contributions support our
thesis that data abstraction can be beneficial to automatic program
generation via artificial evolution.

CSRP-97-04.ps.gz
================

Price's Theorem and the MAX Problem

W. B. Langdon and R. Poli	W.B.Langdon@cs.bham.ac.uk, R.Poli@cs.bham.ac.uk
School of Computer Science
University of Birmingham

We present a detailed analysis of the evolution of GP
populations using the problem of finding a program which
returns the maximum possible value for a given terminal
and function set and a depth limit on the program tree
(known as the MAX problem). We confirm the basic message
of \cite{Gathercole:1996:aicrtd} that crossover together
with program size restrictions can be responsible for
premature convergence to a sub-optimal solution. We show
that this can happen even when the population retains a
high level of variety and show that in many cases
evolution from the sub-optimal solution to the solution is
possible if sufficient time is allowed. In both cases
theoretical models are presented and compared with actual
runs. 

Experimental evidence is presented that Price's Covariance
and Selection Theorem can be applied to GP populations and
the practical effect of program size restrictions are
noted. Finally we show that covariance between gene
frequency and fitness in the first few generations can be
used to predict the course of GP runs.


CSRP-97-09.ps.gz WBL.bloat_wsc2.ps.gz
================ ====================

Fitness Causes Bloat

W. B. Langdon and R. Poli	W.B.Langdon@cs.bham.ac.uk, R.Poli@cs.bham.ac.uk
School of Computer Science
University of Birmingham

WBL.bloat_wsc2.ps.gz is newer version which appears in WSC2

The problem of evolving an artificial ant to follow the Santa Fe trail is used to
demonstrate the well known genetic programming feature of growth in solution
length. 
Known variously as 
``bloat'', ``redundancy'', ``introns'', ``fluff'', ``Structural Complexity''
with antonyms 
``parsimony'', ``Minimum Description Length'' (MDL) and ``Occam's razor''.
Comparison with runs with and without fitness selection
pressure shows the tendency for solutions to grow in size is caused by
fitness based selection.
We argue that such growth is inherent in using a fixed evaluation
function with a discrete but variable length representation.
Since with simple static evaluation search converges
to mainly finding trial solutions with the same
fitness as existing trial solutions.
In general variable length allows many more long representations of a
given solution than short ones of the same solution.
Thus with an unbiased random search we expect longer representations
to occur more often and so representation length tends to increase.
I.e. fitness based selection leads to bloat.


WBL.max_gp97.ps
===============

An Analysis of the MAX Problem in Genetic Programming

W.B.Langdon@cs.bham.ac.uk, R.Poli@cs.bham.ac.uk

We present a detailed analysis of the evolution of GP populations
using the problem of finding a program which returns the maximum
possible value for a given terminal and function set and a depth limit
on the program tree (known as the MAX problem). We confirm the basic
message of \cite{Gathercole:1996:aicrtd} that crossover together with
program size restrictions can be responsible for premature convergence
to a sub-optimal solution. We show that this can happen even when the
population retains a high level of variety and show that in many cases
evolution from the sub-optimal solution to the solution is possible if
sufficient time is allowed. In both cases theoretical models are
presented and compared with actual runs. Price's Covariance and
Selection Theorem is experimentally tested on GP populations. It is
shown to hold only in some cases, in others program size restrictions
cause important deviation from its predictions.

Considerable update of CSRP-97-04. To appear in GP-97


CSRP-97-14.ps.gz
================

Fitness Causes Bloat in Variable Size Representations

W.B.Langdon@cs.bham.ac.uk

We argue based upon the numbers of representations of given length,
that increase in representation length is inherent in using a fixed
evaluation function with a discrete but variable length
representation. Two examples of this are analysed, including the use of
Price's Theorem. Both examples confirm the tendency for solutions to
grow in size is caused by fitness based selection.

Position paper at "Workshop on Evolutionary Computation with Variable
Size Representation" at ICGA-97, 20 July 1997, East Lansing, MI, USA

CSRP-97-16.ps.gz
================

Fitness Causes Bloat: Mutation

W. B. Langdon and R. Poli	W.B.Langdon@cs.bham.ac.uk, R.Poli@cs.bham.ac.uk
School of Computer Science
University of Birmingham

In Late Breaking Papers at GP-97 J. Koza (editor)

The problem of evolving, using mutation, an artificial ant to follow
the Santa Fe trail is used to study the well known genetic programming
feature of growth in solution length.  Known variously as ``bloat'',
``fluff'' and increasing ``structural complexity'', this is often
described in terms of increasing ``redundancy'' in the code caused by
``introns''.

Comparison between runs with and without fitness selection pressure,
backed by Price's Theorem, shows the tendency for solutions to grow in
size is caused by fitness based selection.  We argue that such growth
is inherent in using a fixed evaluation function with a discrete but
variable length representation.  With simple static evaluation search
converges to mainly finding trial solutions with the same fitness as
existing trial solutions.  In general variable length allows many more
long representations of a given solution than short ones.  Thus in
search (without a length bias) we expect longer representations to
occur more often and so representation length to tend to increase.
I.e. fitness based selection leads to bloat.

WBL.euro98_bloatm.ps.gz
======================

Fitness Causes Bloat: Mutation

W. B. Langdon and R. Poli	W.B.Langdon@cwi.nl, R.Poli@cs.bham.ac.uk
School of Computer Science
University of Birmingham

EuroGP'98

The problem of evolving, using mutation, an artificial ant to follow
the Santa Fe trail is used to study the well known genetic programming
feature of growth in solution length.  Known variously as ``bloat'',
``fluff'' and increasing ``structural complexity'', this is often
described in terms of increasing ``redundancy'' in the code caused by
``introns''.

Comparison between runs with and without fitness selection pressure,
backed by Price's Theorem, shows the tendency for solutions to grow in
size is caused by fitness based selection.  We argue that such growth
is inherent in using a fixed evaluation function with a discrete but
variable length representation.  With simple static evaluation search
converges to mainly finding trial solutions with the same fitness as
existing trial solutions.  In general variable length allows many more
long representations of a given solution than short ones.  Thus in
search (without a length bias) we expect longer representations to
occur more often and so representation length to tend to increase.
I.e. fitness based selection leads to bloat.


WBL.wcci98_bloat.ps.gz
======================

The Evolution of Size in Variable Length Representations

W. B. Langdon	W.B.Langdon@cs.bham.ac.uk
School of Computer Science
University of Birmingham

ICEC'98

In many cases programs length's increase (known as "bloat", "fluff"
and increasing "structural complexity") during artificial evolution.
We show bloat is not specific to genetic programming and suggest it is
inherent in search techniques with discrete variable length
representations using simple static evaluation functions.  We
investigate the bloating characteristics of three non-population and
one population based search techniques using a novel mutation
operator.

An artificial ant following the Santa Fe trail problem is solved by
simulated annealing, hill climbing, strict hill climbing and
population based search using two variants of the the new subtree
based mutation operator.  As predicted bloat is observed when using
unbiased mutation and is absent in simulated annealing and both hill
climbers when using the length neutral mutation however bloat occurs
with both mutations when using a population.

We conclude that there are two causes of bloat 1) search operators
with no length bias tend to sample bigger trees and 2) competition
within populations favours longer programs as they can usually
reproduce more accurately.


CSRP-97-22.ps.gz
================

Fitness Causes Bloat: Simulated Annealing, Hill Climbing and Populations

W. B. Langdon	W.B.Langdon@cs.bham.ac.uk
School of Computer Science
University of Birmingham

Much extended version of WBL.wcci98_bloat.ps.gz with appendices on
generating random trees and random walks in program space

CSRP-97-26.ps.gz
================

Scheduling Planned Maintenance of Electrical Power Transmission Networks Using Genetic Algorithms

W. B. Langdon	W.B.Langdon@cs.bham.ac.uk
School of Computer Science
University of Birmingham

Summary of Genetic Algorithm and Genetic programming work on evolving
schedules for preventative maintenance of the South Wales region of
the UK high voltage electricity network.

CSRP-97-29.ps.gz
================

Genetic Programming Bloat with Dynamic Fitness

W. B. Langdon	W.B.Langdon@cs.bham.ac.uk and R.Poli@cs.bhan.ac.uk
School of Computer Science
University of Birmingham

Presented at EuroGP-98

In artificial evolution individuals which perform as their parents are
usually rewarded identically to their parents.  We note that Nature is
more dynamic and there may be a penalty to pay for doing the same
thing as your parents.  We report two sets of experiments where static
fitness functions are firstly augmented by a penalty for unchanged
offspring and secondly the static fitness case is replaced by randomly
generated dynamic test cases. We conclude genetic programming, when
evolving artificial ant control programs, is surprisingly little
effected by large penalties and program growth is observed in all our
experiments.

WBL.euro98_bloatd.ps.gz
=======================

Genetic Programming Bloat with Dynamic Fitness

W. B. Langdon	W.B.Langdon@cwi.nl and R.Poli@cs.bhan.ac.uk
School of Computer Science
University of Birmingham

EuroGP-98

In artificial evolution individuals which perform as their parents are
usually rewarded identically to their parents.  We note that Nature is
more dynamic and there may be a penalty to pay for doing the same
thing as your parents.  We report two sets of experiments where static
fitness functions are firstly augmented by a penalty for unchanged
offspring and secondly the static fitness case is replaced by randomly
generated dynamic test cases.  We conclude genetic programming, when
evolving artificial ant control programs, is surprisingly little
effected by large penalties and program growth is observed in all our
experiments.

CSRP-98-04.ps.gz
================

Why Ants are Hard

W. B. Langdon	W.B.Langdon@cs.bham.ac.uk and R.Poli@cs.bhan.ac.uk
School of Computer Science
University of Birmingham

Accepted by GP-98

The problem of programming an artificial ant to follow the
Santa Fe trail is used as an example program search space.
Analysis of shorter solutions shows they have many of the
characteristics often ascribed to manually coded programs.
Enumeration of a small fraction of the total search space
and random sampling characterise it as rugged with many
multiple plateaus split by deep valleys and many local and
global optima. This suggests it is difficult for hill
climbing algorithms. Analysis of the program search space
in terms of fixed length schema suggests it is highly
deceptive and that for the simplest solutions large
building blocks must be assembled before they have above
average fitness. In some cases we show solutions cannot be
assembled using a fixed representation from small building
blocks of above average fitness. These suggest the Ant
problem is difficult for Genetic Algorithms.

Random sampling of the program search space suggests on
average the density of global optima changes only slowly
with program size but the density of neutral networks
linking points of the same fitness grows approximately
linearly with program length. This is part of the cause of
bloat.

Previously reported genetic programming, simulated
annealing and hill climbing performance is shown not to be
much better than random search on the Ant problem.


evogprep.ps
===========

Genetic Programming in Europe
Report of the EvoGP Working Group on Genetic Programming of the
European Network of Excellence in Evolutionary Computing

W. B. Langdon	W.B.Langdon@cs.bham.ac.uk and R.Poli@cs.bhan.ac.uk
School of Computer Science
University of Birmingham


CSRP-98-08.ps.gz (Late breaking paper at 1st European workshop on GP)
================

Better Trained Ants

W. B. Langdon	W.B.Langdon@cs.bham.ac.uk
School of Computer Science
University of Birmingham

The problem of programming an artificial ant to follow the
Santa~Fe trail has been repeatedly used as a benchmark
problem. Recently we have shown performance of several
techniques is not much better than the best performance
obtainable using uniform random search. We suggested that
this could be because the program fitness landscape is
difficult for hill climbers and the problem is also
difficult for Genetic Algorithms as it contains multiple
levels of deception.

Here we redefine the problem so the ant is obliged to
traverse the trail in approximately the correct order. A
simple genetic programming system, with no size or depth
restriction, is shown to perform approximately three times
better with the improved training function.

CSRP-98-12.ps.gz
================

Better Trained Ants for Genetic Programming

W. B. Langdon and R. Poli	W.B.Langdon@cs.bham.ac.uk, R.Poli@cs.bham.ac.uk
School of Computer Science
University of Birmingham

Extends CSRP-98-08 by:
requiring the ant to find food quickly,
including the ant's speed in the fitness function, either as a linear
   addition or as a second objective in a multi-objective fitness
   function, 
and GP one point crossover.


WBL.antspace_gp98.ps.gz (GP-98)
=======================

Why Ants are Hard

W. B. Langdon and R. Poli	W.B.Langdon@cs.bham.ac.uk, R.Poli@cs.bham.ac.uk

Updated CSRP-98-04.ps.gz

CSRP-98-16.ps.gz (Late breaking paper at GP-98)
================

Boolean Functions Fitness Spaces

W. B. Langdon and R. Poli	W.B.Langdon@cs.bham.ac.uk, R.Poli@cs.bham.ac.uk
School of Computer Science
University of Birmingham

We investigate the distribution of performance of the Boolean
functions of 3 Boolean inputs (particularly that of the parity
functions), the always-on-6 and even-6 parity functions.  We use
enumeration, uniform Monte-Carlo random sampling and sampling random
full trees.  As expected XOR dramatically changes the fitness
distributions.  In all cases once some minimum size threshold has been
exceeded, the distribution of performance is approximately independent
of program length.  However the distribution of the performance of
full trees is different from that of asymmetric trees and varies with
tree depth.

We consider but reject testing the No Free Lunch (NFL) theorems on
these functions.

CSRP-98-17.ps.gz
================

Why "Building Blocks" Don't Work on Parity Problems

W. B. Langdon and R. Poli	W.B.Langdon@cs.bham.ac.uk, R.Poli@cs.bham.ac.uk
School of Computer Science
University of Birmingham

We investigate the distribution of performance of the parity and
always-on Boolean functions given only the appropriate building
block. These problems have ``needle in a haystack'' fitness landscape
and so are unsuitable for genetic programming or other progressive
search techniques. Theoretical analysis shows in the limit as program
size grows the density of solutions is independent of size but falls
exponentially with number of inputs.

WBL.geccco99.fairxo.ps.gz
=========================

Size Fair and Homologous Tree Genetic Programming Crossovers

W. B. Langdon			W.B.Langdon@cwi.nl

To be presented at GECCO-99: Proceedings of the Genetic and
Evolutionary Computation Conference W. Banzhaf and J. Daida and
A. E. Eiben and M. H. Garzon and V. Honavar and M. Jakiela and
R. E. Smith

Size fair and homologous crossover genetic operators for tree based
genetic programming are described and tested. Both produce
considerably reduced increases in program size and no detrimental
effect on GP performance. GP search spaces are partitioned by the
ridge in the number of program versus their size and depth. A ramped
uniform random initialisation is described which straddles the
ridge. With subtree crossover trees increase about one level per
generation leading to sub-quadratic bloat in length.

cwi_fair.ps.gz
==============

Size Fair and Homologous Tree Crossovers

W. B. Langdon			W.B.Langdon@cwi.nl

CWI technical report expanding on WBL.geccco99.fairxo.ps.gz (23 pages)

c.harris
========
Directory containing Chris Harris' PhD thesis

tinayu
======
Directory containing Tina Yu's PhD thesis

WBL.fitnessspaces.ps.gz
=======================
W. B. Langdon Scaling of Program Tree Fitness Spaces,
in Evolutionary Computation vol 7, no 4