This directory contains papers that have been announced on the GP mailing list but are not available at the GP ftp site at ftp.io.com. Please look at the ftp.io.com mirror directory to find any papers that you do not find here. Adil Qureshi Aug '95 icga85.txt ========== Cramer, Nichael Lynn: "A Representation for the Adaptive Generation of Simple Sequential Programs", Proceedings, International Conference on Genetic Algorithms and their Applications, July 1985[CMU], pp183-187. ABSTRACT A system for the adaptive generation sequential computer functions is described. A small operator set is defined and encoded in JB, a fixed length representation. After briefly examining the limitations encountered in this approach we explore TB, a tree-like representation of the code. These representations share the feature that they can be used to represent well-formed, useful computer programs while still being amenable to suitably defined genetic operators. As a working example, the system is used to produce two-input, single-output multiplication functions that are concise and well-defined. Future work, dealing with extensions to more complicated functions and generalizations of the techniques, is also discussed. 94-006.ps.gz ============ Dynamic Training Subset Selection for Supervised Learning in Genetic Programming Chris Gathercole, Peter Ross Department of Artificial In telligence University of Edinburgh 80 South Bridge Edinburgh EH1 1HN U.K. chrisg,peterg@aisb.ed.ac.uk February 25, 1994 Abstract When using the Genetic Programming (GP) Algorithm on a difficult problem with a large set of training data, a large population size is needed and a very large number of function-tree evaluations must be carried out. This paper describes some efforts made to reduce the number of such evaluations by concentrating on selecting a small sub-set of the training data set on which to actually carry out the GP algorithm. Three subset selection methods described in the paper are: Dynamic Subset Selection (DSS), using the current GP run to select `difficult' and/or previously , Historical Subset Selection (HSS), selecting `diffcult' cases from previous GP runs, Random Subset Selection (RSS) , selecting a subset of cases at random for each generation. Various runs have shown that GP+DSS can produce better results in less than 20\045 of the time taken by GP. GP+HSS can nearly match the results of GP , and, perhaps surprisingly , GP+RSS can occasionally approach the results of GP. GP+DSS produced better, more general results than those reported in a paper for a variety of Neural Net works when used on a substantial problem, known as the Thyroid problem. 94-007.ps.gz ============ Applications of genetic algorithms Peter Ross and Dave Corne Department of AI, University of Edinburgh 80 South Bridge, Edinburgh EH1 1HN peter,dave@aisb.ed.ac.uk Abstract Genetic algorithms have been applied to a very wide range of practical problems, often with valuable results. This paper surveys just a few examples, to illustrate the diversity of approaches and to point to some of the considerations that have proved important in making applications successful. Because GAs provide a fairly comprehensible way to address a wide range of difficult engineering and optimisation problems producing good if not optimal results, it seems that the technology is finding its way into real-world use much more easily than, say, expert systems did. University of Edinburgh, Department of Artificial Intelligence, Genetic Algorithms Research Group. Paper No. 94-007 Submitted for a special issue of the AISB Quarterly on Evolutionary Computation, edited by Terry Fogarty CS-TR-95-1542.ps.gz =================== John Koza's report on parallelising GP LearnModels.ps.gz ================= Learning Mental Models Astro Teller Dept of Computer Science Carnegie Mellon Univeristy Pittsburgh, PA 15213 astro@cs.cmu.edu MentalModels.ps.gz ================== Chapter 9 of AIGP PADO-Tech-Report.ps.gz ====================== PADO: Learning Tree Structured Algorithms for Orchestration into an Object System Astro Teller and Manuela Veloso February 10, 1995 School of Computer Science Carnegie Mellon University Pittsburgh, P A 15213 Abstract Most artificial intelligence systems work on simple problems and artificial because they rely on the accurate sensing of the task world. Object recognition is a crucial part of the sensing challenge and machine learning stands in a position to catapult object in to real world domains. Given that, to date, machine learning has not delivered general object recognition, we propose a different point of attack: the learning architectures We have developed a method for directly learning and combining algorithms in a new way that imposes little burden on or bias from the humans involved. This learning architecture, PADO, and the new results it brings to the problem of natural image object recognition is the focus of this report. PADO.ps.gz ========== PADO: A New Learning Architecture for Object Recognition Astro Teller and Manuela Veloso Carnegie Mellon University Abstract Most artificial intelligence systems to day work on simple problems and artificial domains because they rely on the accurate sensing of the task world. Object recognition is a crucial part of the sensing challenge and machine learning stands in a position to catapult object recognition into real world domains. Given that, to date, machine learning has not delivered general object recognition, we propose a different point of the learning architectures themselves. We have developed a method for directly learning and combining algorithms in a new way that imposes little burden on or bias from the humans involved. This learning architecture, PADO, and the new results it brings to the problem of natural image object recognition is the focus of this chapter. Turing.ps.gz ============ Turing Completeness in the Language of Genetic Programming with Indexed Memory Astro Teller Carnegie Mellon University aisb.ps.gz ========== Building New Spatial Interaction Models Using Genetic Programming S. Openshaw and I. Turton, School of Geography, University of Leeds, Leeds, UK email: stan@geog.leeds.ac.uk, ian@geog.leeds.ac.uk Abstract Building computer models of human geographic systems is difficult because of the poverty of applicable theory that can be used to specify well formed and soundly based mathematicalmodels. As a result there has been little progress in this area since the late 1960's when mathematical model building procedures based on entropy-maximising methods were first applied to spatial interaction data. In the mid-1980's, attention was focused on the use of genetic algorithms to explore the universe of different models that could be built from the available symbolic pieces; that is there is a very very large number of permutations of the model building blocks; viz. a mix of unary and binary operators, unknown parameters and observed variables, and rules of syntax for well formed equations. The so-called Automated Modeling System (AMS) worked but never fulfilled its expectations. The paper revisits the problem but re-casts the AMS approach into a genetic programming framework. Some preliminary results based on Cray-YMP runs are reported. cook94a.ps.gz ============= Journal of Artificial Intelligence Research 1 (1994) 231-255 Submitted 12/93; published 2/94 Substructure Discovery Using Minimum Description Length and Background Knowledge Diane J. Cook cook@cse.uta.edu Lawrence B. Holder holder@cse.uta.edu Department of Computer Science Engineering Box 19015 University of Texas at Arlington Arlington, TX 76019 USA Abstract The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our Subdue substructure discovery system based on the minimum description length principle. The Subdue system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously discovered substructures in the data, multiple passes of Subdue produce a hierarchical description of the structural regularities in the data. Subdue uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by Subdue to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate Subdue's ability to find substructures capable of compressing the original data and to discover structural concepts important to the domain. dret_postscript.tar.gz ====================== Artificial-Life Simulators and Their Applications Howard Gutowitz Ecole Supérieure de Physique et de Chimie Industrielles Laboratoire d'Electronique 10 rue Vauquelin 75005 Paris, France and The Santa Fe Institute 1399 Hyde Park Road Santa Fe, NM 87501 hag@santafe.edu HTML: http://alife.santafe.edu/alife/topics/simulators/dret/dret.html Abstract: Artificial Life (Alife) is a rapidly growing field of scientific research linking biology and computer science. It seeks to understand how life-like processes can be embodied in computer programs. Advances in this area promise to illuminate fundamental questions both in biology ("What is life?") and in Computer Science ("How to make robust and adaptable computer programs?"). Much of the work in artificial life is directed toward building computer simulations of artificial creatures and the artificial worlds in which they live. This report surveys major efforts in this area, with attention to developments likely to lead to practical applications in the short to middle term. ep.ps.gz ====== Genetic Programming of Music Jeffrerey B. Putnam New Mexico Instute of Mining and Technology Aug 30th '94 model-of-landscapes.ps.gz ========================= Terry Jones February 1, 1994 Abstract The use of the term landscape" is increasing rapidly in the field of evolutionary computation, yet in many cases it remains poorly, if at all, defined. This situation has perhaps developed because every one grasps the imagery immediately, and the questions that would be asked of a less evocative term do not get asked. This paper presents a model of landscapes that is general enough to encompass most of what computer scientists would call search, though the model is not restricted to either the field or the viewpoint. It is particularly relevant to algorithms that employ some form of crossover, and hence to genetic algorithms and other members of the evolutionary computing family. An overview of the consequences and properties of the model establishes a connection with more traditional search algorithms from artificial intelligence, introduces the notion of a crossover landscape, and argues the importance of viewing search as navigation and structure. Santa Fe Institute, 1660 Old Pecos Trail, Suite A., Santa Fe, NM 87505, USA and Department of Computer Science, University of New Mexico, Albuquerque NM 87131, USA. Email: terry@san tafe.edu ppsn-94.ps.gz ============= Program Search with a Hierarchical Variable Length Representation: Genetic Programming, Simulated Annealing and Hill Climbing Una-May O`Reilly and Franz Oppacher Santa Fe Institute, Santa Fe, NM, 87505, USA School of Computer Science, Carleton University, Ottawa, CANADA Abstract. This paper presents a comparison of Genetic Programming(GP) with Simulated Annealing (SA) and Stochastic Iterated Hill Clim bing (SIHC) based on a suite of program discovery problems which have been previously tackled only with GP. All three search algorithms employ the hierarchical variable length representation for programs brough t into recent prominence with the GP paradigm [8]. We feel it is not intuitively obvious that mutation-based adaptive search can handle program discovery yet, to date, for each GP problem we have tried, SA or SIHC also work. ppsn92.ps.gz ============ An Experimental Perspective on Genetic Programming Una-May O`Reilly and Franz Oppacher Intelligent Systems Research Group, School of Computer Science, Carleton University, Ottawa, Ontario, K1S 5B6, CANADA Abstract Genetic Programming (GP) has recently been introduced by John R. Koza [5] as a method for genetically breeding populations of computer programs to solve problems. We believe GP to constitute a significant extension of the Genetic Algorithm (GA) research paradigm primarily because it generalizes the genetic search techniques: instead of looking for a solution to a specific instance of a problem, GP attempts to evolvea program capable of computing the solutions for any instance of the problem. We have implemented a genetic programming environment, GP*, that is capable of duplicating Koza`s experiments. In this paper we describe a specific GP experimenton the evolution of programs to sort vectors, and discuss the issues that must be addressed in any application of GP: the design of fitness functions and test suites, and the selection of program terminals and functions. Our observations point to several previously unnoticed shortcomings of the GP approach. We hypothesize that these shortcomings are due to the fact that GP only uses a hierarchical representation but does not construct its solutions in an explicitly hierarchical manner. oreilly ======= Una-May O'Reilly PhD thesis (An Analysis of Genetic Programming) rosca ===== Directory containing GP papers by J. P. Rosca ("Learning by Adapting Representations in GP etc.) terry ===== Papers by Terry Jones, including PhD thesis. (Model of Landscapes etc.) GPdata_icga-95.ps ================= Evolving Data Structures with Genetic Programming W. B. Langdon W.B.Langdon@cs.bham.ac.uk Computer Science Dept. University College London It is established good software engineering practice to ensure that programs use memory via abstract data structures such as stacks, queues and lists. These provide an interface between the program and memory, freeing the program of memory management details which are left to the data structures to implement. The main result presented herein is that GP can automatically generate stacks and queues. Typically abstract data structures support multiple operations, such as put and get. We show that GP can simultaneously evolve all the operations of a data structure by implementing each such operation with its own independent program tree. That is, the chromosome consists of a fixed number of independent program trees. Moreover, crossover only mixes genetic material of program trees that implement the same operation. Program trees interact with each other only via shared memory and shared ``Automatically Defined Functions'' (ADFs). ADFs, ``pass by reference'' when calling them, Pareto selection, ``good software engineering practice'' and partitioning the genetic population into ``demes'' where also investigated whilst evolving the queue in order to improve the GP solutions. Code is available from this site WBL_aaai-pppGP.ps ================= Pareto Optimality, Population Partitioning, Price's theorem and GP W. B. Langdon W.B.Langdon@cs.bham.ac.uk Computer Science Dept. University College London A description of a use of Pareto optimality in genetic programming is given and an analogy with Genetic Algorithm fitness niches is drawn. Techniques to either spread the population across many pareto optimal fitness values or to reduce the spread are described. It is speculated that a wide spread may not aid Genetic Programming. It is suggested that this might give useful insight into many GPs whose fitness is composed of several sub-objectives. The successful use of demic populations in GP leads to speculation that smaller evolutionary steps might aid GP in the long run. An example is given where Price's covariance theorem helped when designing a GP fitness function. directed_crossover.ps ===================== Summary of directing crossover locations in a multi-tree GP W. B. Langdon W.B.Langdon@cs.bham.ac.uk Computer Science Dept. University College London GPlist_aigp2.ps =============== W. B. Langdon W.B.Langdon@cs.bham.ac.uk Computer Science Dept. University College London Abstract for Advances in Genetic Programming 2 Much published work on genetic programming evolves functions without ``side-effects'' to learn patterns in test data, once produced they can be used to make predictions regarding new data. In contrast human written programs often make extensive and explicit use of memory. Indeed memory in some form is required for a programming system to be Turing Complete. In both normal and genetic programming considerable benefits have been found in adopting a ``structured approach''. For example, Koza has shown the introduction of evolvable code modules (ADFs) can greatly reduce the effort required for GP to reach a solution. Teller has shown that the introduction of indexed memory into genetic programming makes it Turing Complete. This work builds on Teller's and considers the introduction of data structures within GP. The expectation is that the use of data structures will prove as useful to GP as code structuring has already been shown to be. So far the important result is that GP can evolve simple data structures, such as stacks and queues. This abstract reports the successful evolution a list data structure, compares the use of Pareto selection and demes and describes the directed choice of crossover points. WBL.gpgrid_gp96.ps ================== W. B. Langdon W.B.Langdon@cs.bham.ac.uk Computer Science Dept. University College London Abstract for: Scheduling Maintenance of Electrical Power Transmission Networks Using Genetic Programming The National Grid Company Plc is responsible for the maintenance of the high voltage electricity transmission network in England and Wales. It must plan maintenance so as to minimize costs taking into account: (1) location and size of demand, (2) generator capacities and availabilities, (3) electricity carrying capacity of the remainder of the network, that part not undergoing maintenance. Previous work showed the combination of a Genetic Algorithm using an order or permutation chromosome combined with hand coded ``Greedy'' Optimizers can readily produce an optimal schedule for a four node test problem. Following this the same GA has been used to find low cost schedules for the South Wales region of the UK high voltage power network. This paper describes the evolution of the best known schedule for the base South Wales problem using Genetic Programming starting from the hand coded heuristics used with the GA. Submitted to GP96 late breaking papers surveyRN76.ps =============== W. B. Langdon W.B.Langdon@cs.bham.ac.uk A. Qureshi A.Qureshi@cs.ucl.ac.uk Computer Science Dept. University College London Genetic Programming -- Computers using "Natural Selection" to generate programs RN/95/76 Abstract Computers that ``program themselves''; science fact or fiction? Genetic Programming uses novel optimisation techniques to ``evolve'' simple programs; mimicking the way humans construct programs by progressively re-writing them. Trial programs are repeatedly modified in the search for ``better/fitter'' solutions. The underlying basis is Genetic Algorithms (GAs). Genetic Algorithms, pioneered by Holland, Goldberg and others, are evolutionary search techniques inspired by natural selection (i.e\ survival of the fittest). GAs work with a ``population'' of trial solutions to a problem, frequently encoded as strings, and repeatedly select the ``fitter'' solutions, attempting to evolve better ones. The power of GAs is being demonstrated for an increasing range of applications; financial, imaging, VLSI circuit layout, gas pipeline control and production scheduling. But one of the most intriguing uses of GAs - driven by Koza - is automatic program generation. Genetic Programming applies GAs to a ``population'' of programs - typically encoded as tree-structures. Trial programs are evaluated against a ``fitness function'' and the best solutions selected for modification and re-evaluation. This modification-evaluation cycle is repeated until a ``correct'' program is produced. GP has demonstrated its potential by evolving simple programs for medical signal filters, classifying news stories, performing optical character recognition, and for target identification. This paper surveys the exciting field of Genetic Programming. As a basis it reviews Genetic Algorithms and automatic program generation. Next it introduces Genetic Programming, describing its history and describing the technique via a worked example in C. Then using a taxonomy that divides GP researchs into theory/techniques and applications, it surveys recent work from both of these perspectives. Extensive bibliographies, glossaries and a resource list are included as appendices. WBL.ecj.price.125.ps.gz ======================= Evolution of Genetic Programming Populations W.B.Langdon@cs.bham.ac.uk RN/96/125 We investigate in detail what happens as genetic programming (GP) populations evolve. Since we shall use the populations which showed GP can evolve stack data structures as examples, we start in Section 1 by briefly describing the stack experiment \cite{Langdon:1995:GPdata}. In Section 2 we show Price's Covariance and Selection Theorem can be applied to Genetic Algorithms (GAs) and GP to predict changes in gene frequencies. We follow the proof of the theorem with experimental justification using the GP runs from the stack problem. Section 3 briefly describes Fisher's Fundamental Theorem of Natural Selection and shows in its normal interpretation it does not apply to practical GAs. An analysis of the stack populations, in Section 4, explains that the difficulty of the stack problem is due to the presence of ``deceptive'' high scoring partial solutions in the population. These cause a negative correlation between necessary primitives and fitness. As Price's Theorem predicts, the frequency of necessary primitives falls, eventually leading to their extinction and so to the impossibility of finding solutions like those that are evolved in successful runs. Section 4 investigates the evolution of variety in GP populations. Detailed measurements of the evolution of variety in stack populations reveal loss of diversity causing crossover to produce offspring which are copies of their parents. Section 5 concludes with measurements that show in the stack population crossover readily produces improvements in performance initially but later no improvements at all are made by crossover. Section 6 discusses the importance of these results to GP in general 48 pages bruce.thesis.ps.gz ================== Wilker Shane Bruce: "The Application of Genetic Programming to the Automatic Generation of Object-Oriented Programs". Phd Thesis, Dec 1995. langdon.ps.gz ============= W. B. Langdon W.B.Langdon@cs.bham.ac.uk Computer Science Dept. University College London Abstract This thesis investigates the evolution and use of abstract data types within Genetic Programming (GP). In genetic programming the principles of natural evolution (fitness based selection and recombination) acts on program code to automatically generate computer programs. The research in this thesis is motivated by the observation from software engineering that data abstraction (eg via abstract data types) is essential in programs created by human programmers. We investigate whether abstract data types can be similarly beneficial to the automatic production of programs using GP. GP can automatically ``evolve'' programs which solve non-trivial problems but few experiments have been reported where the evolved programs explicitly manipulate memory and yet memory is an essential component of most computer programs. So far work on evolving programs that explicitly use memory has principally used either problem specific memory models or a simple indexed memory model consisting of a single global shared array. Whilst the latter is potentially sufficient to allow any computation to evolve, it is unstructured and allows complex interaction between parts of programs which weaken their modularity. In software engineering this is addressed by controlled use of memory using scoping rules and abstract data types, such as stacks, queues and files. This thesis makes five main contributions: (1) Proving that abstract data types (stacks, queues and lists) can be evolved using genetic programming. (2) Demonstrating GP can evolve general programs which recognise a Dyck context free language, evaluate Reverse Polish expressions and GP with an appropriate memory structure can solve the nested brackets problem which had previously been solved using a hybrid GP. (3) In these three cases (Dyck, expression evaluation and nested brackets) an appropriate data structure is proved to be beneficial compared to indexed memory. (4) Investigations of real world electrical network maintenance scheduling problems demonstrate that Genetic Algorithms can find low cost viable solutions to such problems. (5) A taxonomy of GP is presented, including a critical review of experiments with evolving memory. These contributions support our thesis that data abstraction can be beneficial to automatic program generation via artificial evolution. CSRP-97-04.ps.gz ================ Price's Theorem and the MAX Problem W. B. Langdon and R. Poli W.B.Langdon@cs.bham.ac.uk, R.Poli@cs.bham.ac.uk School of Computer Science University of Birmingham We present a detailed analysis of the evolution of GP populations using the problem of finding a program which returns the maximum possible value for a given terminal and function set and a depth limit on the program tree (known as the MAX problem). We confirm the basic message of \cite{Gathercole:1996:aicrtd} that crossover together with program size restrictions can be responsible for premature convergence to a sub-optimal solution. We show that this can happen even when the population retains a high level of variety and show that in many cases evolution from the sub-optimal solution to the solution is possible if sufficient time is allowed. In both cases theoretical models are presented and compared with actual runs. Experimental evidence is presented that Price's Covariance and Selection Theorem can be applied to GP populations and the practical effect of program size restrictions are noted. Finally we show that covariance between gene frequency and fitness in the first few generations can be used to predict the course of GP runs. CSRP-97-09.ps.gz WBL.bloat_wsc2.ps.gz ================ ==================== Fitness Causes Bloat W. B. Langdon and R. Poli W.B.Langdon@cs.bham.ac.uk, R.Poli@cs.bham.ac.uk School of Computer Science University of Birmingham WBL.bloat_wsc2.ps.gz is newer version which appears in WSC2 The problem of evolving an artificial ant to follow the Santa Fe trail is used to demonstrate the well known genetic programming feature of growth in solution length. Known variously as ``bloat'', ``redundancy'', ``introns'', ``fluff'', ``Structural Complexity'' with antonyms ``parsimony'', ``Minimum Description Length'' (MDL) and ``Occam's razor''. Comparison with runs with and without fitness selection pressure shows the tendency for solutions to grow in size is caused by fitness based selection. We argue that such growth is inherent in using a fixed evaluation function with a discrete but variable length representation. Since with simple static evaluation search converges to mainly finding trial solutions with the same fitness as existing trial solutions. In general variable length allows many more long representations of a given solution than short ones of the same solution. Thus with an unbiased random search we expect longer representations to occur more often and so representation length tends to increase. I.e. fitness based selection leads to bloat. WBL.max_gp97.ps =============== An Analysis of the MAX Problem in Genetic Programming W.B.Langdon@cs.bham.ac.uk, R.Poli@cs.bham.ac.uk We present a detailed analysis of the evolution of GP populations using the problem of finding a program which returns the maximum possible value for a given terminal and function set and a depth limit on the program tree (known as the MAX problem). We confirm the basic message of \cite{Gathercole:1996:aicrtd} that crossover together with program size restrictions can be responsible for premature convergence to a sub-optimal solution. We show that this can happen even when the population retains a high level of variety and show that in many cases evolution from the sub-optimal solution to the solution is possible if sufficient time is allowed. In both cases theoretical models are presented and compared with actual runs. Price's Covariance and Selection Theorem is experimentally tested on GP populations. It is shown to hold only in some cases, in others program size restrictions cause important deviation from its predictions. Considerable update of CSRP-97-04. To appear in GP-97 CSRP-97-14.ps.gz ================ Fitness Causes Bloat in Variable Size Representations W.B.Langdon@cs.bham.ac.uk We argue based upon the numbers of representations of given length, that increase in representation length is inherent in using a fixed evaluation function with a discrete but variable length representation. Two examples of this are analysed, including the use of Price's Theorem. Both examples confirm the tendency for solutions to grow in size is caused by fitness based selection. Position paper at "Workshop on Evolutionary Computation with Variable Size Representation" at ICGA-97, 20 July 1997, East Lansing, MI, USA CSRP-97-16.ps.gz ================ Fitness Causes Bloat: Mutation W. B. Langdon and R. Poli W.B.Langdon@cs.bham.ac.uk, R.Poli@cs.bham.ac.uk School of Computer Science University of Birmingham In Late Breaking Papers at GP-97 J. Koza (editor) The problem of evolving, using mutation, an artificial ant to follow the Santa Fe trail is used to study the well known genetic programming feature of growth in solution length. Known variously as ``bloat'', ``fluff'' and increasing ``structural complexity'', this is often described in terms of increasing ``redundancy'' in the code caused by ``introns''. Comparison between runs with and without fitness selection pressure, backed by Price's Theorem, shows the tendency for solutions to grow in size is caused by fitness based selection. We argue that such growth is inherent in using a fixed evaluation function with a discrete but variable length representation. With simple static evaluation search converges to mainly finding trial solutions with the same fitness as existing trial solutions. In general variable length allows many more long representations of a given solution than short ones. Thus in search (without a length bias) we expect longer representations to occur more often and so representation length to tend to increase. I.e. fitness based selection leads to bloat. WBL.euro98_bloatm.ps.gz ====================== Fitness Causes Bloat: Mutation W. B. Langdon and R. Poli W.B.Langdon@cwi.nl, R.Poli@cs.bham.ac.uk School of Computer Science University of Birmingham EuroGP'98 The problem of evolving, using mutation, an artificial ant to follow the Santa Fe trail is used to study the well known genetic programming feature of growth in solution length. Known variously as ``bloat'', ``fluff'' and increasing ``structural complexity'', this is often described in terms of increasing ``redundancy'' in the code caused by ``introns''. Comparison between runs with and without fitness selection pressure, backed by Price's Theorem, shows the tendency for solutions to grow in size is caused by fitness based selection. We argue that such growth is inherent in using a fixed evaluation function with a discrete but variable length representation. With simple static evaluation search converges to mainly finding trial solutions with the same fitness as existing trial solutions. In general variable length allows many more long representations of a given solution than short ones. Thus in search (without a length bias) we expect longer representations to occur more often and so representation length to tend to increase. I.e. fitness based selection leads to bloat. WBL.wcci98_bloat.ps.gz ====================== The Evolution of Size in Variable Length Representations W. B. Langdon W.B.Langdon@cs.bham.ac.uk School of Computer Science University of Birmingham ICEC'98 In many cases programs length's increase (known as "bloat", "fluff" and increasing "structural complexity") during artificial evolution. We show bloat is not specific to genetic programming and suggest it is inherent in search techniques with discrete variable length representations using simple static evaluation functions. We investigate the bloating characteristics of three non-population and one population based search techniques using a novel mutation operator. An artificial ant following the Santa Fe trail problem is solved by simulated annealing, hill climbing, strict hill climbing and population based search using two variants of the the new subtree based mutation operator. As predicted bloat is observed when using unbiased mutation and is absent in simulated annealing and both hill climbers when using the length neutral mutation however bloat occurs with both mutations when using a population. We conclude that there are two causes of bloat 1) search operators with no length bias tend to sample bigger trees and 2) competition within populations favours longer programs as they can usually reproduce more accurately. CSRP-97-22.ps.gz ================ Fitness Causes Bloat: Simulated Annealing, Hill Climbing and Populations W. B. Langdon W.B.Langdon@cs.bham.ac.uk School of Computer Science University of Birmingham Much extended version of WBL.wcci98_bloat.ps.gz with appendices on generating random trees and random walks in program space CSRP-97-26.ps.gz ================ Scheduling Planned Maintenance of Electrical Power Transmission Networks Using Genetic Algorithms W. B. Langdon W.B.Langdon@cs.bham.ac.uk School of Computer Science University of Birmingham Summary of Genetic Algorithm and Genetic programming work on evolving schedules for preventative maintenance of the South Wales region of the UK high voltage electricity network. CSRP-97-29.ps.gz ================ Genetic Programming Bloat with Dynamic Fitness W. B. Langdon W.B.Langdon@cs.bham.ac.uk and R.Poli@cs.bhan.ac.uk School of Computer Science University of Birmingham Presented at EuroGP-98 In artificial evolution individuals which perform as their parents are usually rewarded identically to their parents. We note that Nature is more dynamic and there may be a penalty to pay for doing the same thing as your parents. We report two sets of experiments where static fitness functions are firstly augmented by a penalty for unchanged offspring and secondly the static fitness case is replaced by randomly generated dynamic test cases. We conclude genetic programming, when evolving artificial ant control programs, is surprisingly little effected by large penalties and program growth is observed in all our experiments. WBL.euro98_bloatd.ps.gz ======================= Genetic Programming Bloat with Dynamic Fitness W. B. Langdon W.B.Langdon@cwi.nl and R.Poli@cs.bhan.ac.uk School of Computer Science University of Birmingham EuroGP-98 In artificial evolution individuals which perform as their parents are usually rewarded identically to their parents. We note that Nature is more dynamic and there may be a penalty to pay for doing the same thing as your parents. We report two sets of experiments where static fitness functions are firstly augmented by a penalty for unchanged offspring and secondly the static fitness case is replaced by randomly generated dynamic test cases. We conclude genetic programming, when evolving artificial ant control programs, is surprisingly little effected by large penalties and program growth is observed in all our experiments. CSRP-98-04.ps.gz ================ Why Ants are Hard W. B. Langdon W.B.Langdon@cs.bham.ac.uk and R.Poli@cs.bhan.ac.uk School of Computer Science University of Birmingham Accepted by GP-98 The problem of programming an artificial ant to follow the Santa Fe trail is used as an example program search space. Analysis of shorter solutions shows they have many of the characteristics often ascribed to manually coded programs. Enumeration of a small fraction of the total search space and random sampling characterise it as rugged with many multiple plateaus split by deep valleys and many local and global optima. This suggests it is difficult for hill climbing algorithms. Analysis of the program search space in terms of fixed length schema suggests it is highly deceptive and that for the simplest solutions large building blocks must be assembled before they have above average fitness. In some cases we show solutions cannot be assembled using a fixed representation from small building blocks of above average fitness. These suggest the Ant problem is difficult for Genetic Algorithms. Random sampling of the program search space suggests on average the density of global optima changes only slowly with program size but the density of neutral networks linking points of the same fitness grows approximately linearly with program length. This is part of the cause of bloat. Previously reported genetic programming, simulated annealing and hill climbing performance is shown not to be much better than random search on the Ant problem. evogprep.ps =========== Genetic Programming in Europe Report of the EvoGP Working Group on Genetic Programming of the European Network of Excellence in Evolutionary Computing W. B. Langdon W.B.Langdon@cs.bham.ac.uk and R.Poli@cs.bhan.ac.uk School of Computer Science University of Birmingham CSRP-98-08.ps.gz (Late breaking paper at 1st European workshop on GP) ================ Better Trained Ants W. B. Langdon W.B.Langdon@cs.bham.ac.uk School of Computer Science University of Birmingham The problem of programming an artificial ant to follow the Santa~Fe trail has been repeatedly used as a benchmark problem. Recently we have shown performance of several techniques is not much better than the best performance obtainable using uniform random search. We suggested that this could be because the program fitness landscape is difficult for hill climbers and the problem is also difficult for Genetic Algorithms as it contains multiple levels of deception. Here we redefine the problem so the ant is obliged to traverse the trail in approximately the correct order. A simple genetic programming system, with no size or depth restriction, is shown to perform approximately three times better with the improved training function. CSRP-98-12.ps.gz ================ Better Trained Ants for Genetic Programming W. B. Langdon and R. Poli W.B.Langdon@cs.bham.ac.uk, R.Poli@cs.bham.ac.uk School of Computer Science University of Birmingham Extends CSRP-98-08 by: requiring the ant to find food quickly, including the ant's speed in the fitness function, either as a linear addition or as a second objective in a multi-objective fitness function, and GP one point crossover. WBL.antspace_gp98.ps.gz (GP-98) ======================= Why Ants are Hard W. B. Langdon and R. Poli W.B.Langdon@cs.bham.ac.uk, R.Poli@cs.bham.ac.uk Updated CSRP-98-04.ps.gz CSRP-98-16.ps.gz (Late breaking paper at GP-98) ================ Boolean Functions Fitness Spaces W. B. Langdon and R. Poli W.B.Langdon@cs.bham.ac.uk, R.Poli@cs.bham.ac.uk School of Computer Science University of Birmingham We investigate the distribution of performance of the Boolean functions of 3 Boolean inputs (particularly that of the parity functions), the always-on-6 and even-6 parity functions. We use enumeration, uniform Monte-Carlo random sampling and sampling random full trees. As expected XOR dramatically changes the fitness distributions. In all cases once some minimum size threshold has been exceeded, the distribution of performance is approximately independent of program length. However the distribution of the performance of full trees is different from that of asymmetric trees and varies with tree depth. We consider but reject testing the No Free Lunch (NFL) theorems on these functions. CSRP-98-17.ps.gz ================ Why "Building Blocks" Don't Work on Parity Problems W. B. Langdon and R. Poli W.B.Langdon@cs.bham.ac.uk, R.Poli@cs.bham.ac.uk School of Computer Science University of Birmingham We investigate the distribution of performance of the parity and always-on Boolean functions given only the appropriate building block. These problems have ``needle in a haystack'' fitness landscape and so are unsuitable for genetic programming or other progressive search techniques. Theoretical analysis shows in the limit as program size grows the density of solutions is independent of size but falls exponentially with number of inputs. WBL.geccco99.fairxo.ps.gz ========================= Size Fair and Homologous Tree Genetic Programming Crossovers W. B. Langdon W.B.Langdon@cwi.nl To be presented at GECCO-99: Proceedings of the Genetic and Evolutionary Computation Conference W. Banzhaf and J. Daida and A. E. Eiben and M. H. Garzon and V. Honavar and M. Jakiela and R. E. Smith Size fair and homologous crossover genetic operators for tree based genetic programming are described and tested. Both produce considerably reduced increases in program size and no detrimental effect on GP performance. GP search spaces are partitioned by the ridge in the number of program versus their size and depth. A ramped uniform random initialisation is described which straddles the ridge. With subtree crossover trees increase about one level per generation leading to sub-quadratic bloat in length. cwi_fair.ps.gz ============== Size Fair and Homologous Tree Crossovers W. B. Langdon W.B.Langdon@cwi.nl CWI technical report expanding on WBL.geccco99.fairxo.ps.gz (23 pages) c.harris ======== Directory containing Chris Harris' PhD thesis tinayu ====== Directory containing Tina Yu's PhD thesis WBL.fitnessspaces.ps.gz ======================= W. B. Langdon Scaling of Program Tree Fitness Spaces, in Evolutionary Computation vol 7, no 4