This directory contains papers that have been announced on the GP mailing list but are not available at the GP ftp site at ftp.io.com. Please look at the ftp.io.com mirror directory to find any papers that you do not find here. Adil Qureshi Aug '95 icga85.txt ========== Cramer, Nichael Lynn: "A Representation for the Adaptive Generation of Simple Sequential Programs", Proceedings, International Conference on Genetic Algorithms and their Applications, July 1985[CMU], pp183-187. ABSTRACT A system for the adaptive generation sequential computer functions is described. A small operator set is defined and encoded in JB, a fixed length representation. After briefly examining the limitations encountered in this approach we explore TB, a tree-like representation of the code. These representations share the feature that they can be used to represent well-formed, useful computer programs while still being amenable to suitably defined genetic operators. As a working example, the system is used to produce two-input, single-output multiplication functions that are concise and well-defined. Future work, dealing with extensions to more complicated functions and generalizations of the techniques, is also discussed. 94-006.ps.gz ============ Dynamic Training Subset Selection for Supervised Learning in Genetic Programming Chris Gathercole, Peter Ross Department of Artificial In telligence University of Edinburgh 80 South Bridge Edinburgh EH1 1HN U.K. chrisg,peterg@aisb.ed.ac.uk February 25, 1994 Abstract When using the Genetic Programming (GP) Algorithm on a difficult problem with a large set of training data, a large population size is needed and a very large number of function-tree evaluations must be carried out. This paper describes some efforts made to reduce the number of such evaluations by concentrating on selecting a small sub-set of the training data set on which to actually carry out the GP algorithm. Three subset selection methods described in the paper are: Dynamic Subset Selection (DSS), using the current GP run to select `difficult' and/or previously , Historical Subset Selection (HSS), selecting `diffcult' cases from previous GP runs, Random Subset Selection (RSS) , selecting a subset of cases at random for each generation. Various runs have shown that GP+DSS can produce better results in less than 20\045 of the time taken by GP. GP+HSS can nearly match the results of GP , and, perhaps surprisingly , GP+RSS can occasionally approach the results of GP. GP+DSS produced better, more general results than those reported in a paper for a variety of Neural Net works when used on a substantial problem, known as the Thyroid problem. 94-007.ps.gz ============ Applications of genetic algorithms Peter Ross and Dave Corne Department of AI, University of Edinburgh 80 South Bridge, Edinburgh EH1 1HN peter,dave@aisb.ed.ac.uk Abstract Genetic algorithms have been applied to a very wide range of practical problems, often with valuable results. This paper surveys just a few examples, to illustrate the diversity of approaches and to point to some of the considerations that have proved important in making applications successful. Because GAs provide a fairly comprehensible way to address a wide range of difficult engineering and optimisation problems producing good if not optimal results, it seems that the technology is finding its way into real-world use much more easily than, say, expert systems did. University of Edinburgh, Department of Artificial Intelligence, Genetic Algorithms Research Group. Paper No. 94-007 Submitted for a special issue of the AISB Quarterly on Evolutionary Computation, edited by Terry Fogarty CS-TR-95-1542.ps.gz =================== John Koza's report on parallelising GP LearnModels.ps.gz ================= Learning Mental Models Astro Teller Dept of Computer Science Carnegie Mellon Univeristy Pittsburgh, PA 15213 astro@cs.cmu.edu MentalModels.ps.gz ================== Chapter 9 of AIGP PADO-Tech-Report.ps.gz ====================== PADO: Learning Tree Structured Algorithms for Orchestration into an Object System Astro Teller and Manuela Veloso February 10, 1995 School of Computer Science Carnegie Mellon University Pittsburgh, P A 15213 Abstract Most artificial intelligence systems work on simple problems and artificial because they rely on the accurate sensing of the task world. Object recognition is a crucial part of the sensing challenge and machine learning stands in a position to catapult object in to real world domains. Given that, to date, machine learning has not delivered general object recognition, we propose a different point of attack: the learning architectures We have developed a method for directly learning and combining algorithms in a new way that imposes little burden on or bias from the humans involved. This learning architecture, PADO, and the new results it brings to the problem of natural image object recognition is the focus of this report. PADO.ps.gz ========== PADO: A New Learning Architecture for Object Recognition Astro Teller and Manuela Veloso Carnegie Mellon University Abstract Most artificial intelligence systems to day work on simple problems and artificial domains because they rely on the accurate sensing of the task world. Object recognition is a crucial part of the sensing challenge and machine learning stands in a position to catapult object recognition into real world domains. Given that, to date, machine learning has not delivered general object recognition, we propose a different point of the learning architectures themselves. We have developed a method for directly learning and combining algorithms in a new way that imposes little burden on or bias from the humans involved. This learning architecture, PADO, and the new results it brings to the problem of natural image object recognition is the focus of this chapter. Turing.ps.gz ============ Turing Completeness in the Language of Genetic Programming with Indexed Memory Astro Teller Carnegie Mellon University aisb.ps.gz ========== Building New Spatial Interaction Models Using Genetic Programming S. Openshaw and I. Turton, School of Geography, University of Leeds, Leeds, UK email: stan@geog.leeds.ac.uk, ian@geog.leeds.ac.uk Abstract Building computer models of human geographic systems is difficult because of the poverty of applicable theory that can be used to specify well formed and soundly based mathematicalmodels. As a result there has been little progress in this area since the late 1960's when mathematical model building procedures based on entropy-maximising methods were first applied to spatial interaction data. In the mid-1980's, attention was focused on the use of genetic algorithms to explore the universe of different models that could be built from the available symbolic pieces; that is there is a very very large number of permutations of the model building blocks; viz. a mix of unary and binary operators, unknown parameters and observed variables, and rules of syntax for well formed equations. The so-called Automated Modeling System (AMS) worked but never fulfilled its expectations. The paper revisits the problem but re-casts the AMS approach into a genetic programming framework. Some preliminary results based on Cray-YMP runs are reported. cook94a.ps.gz ============= Journal of Artificial Intelligence Research 1 (1994) 231-255 Submitted 12/93; published 2/94 Substructure Discovery Using Minimum Description Length and Background Knowledge Diane J. Cook cook@cse.uta.edu Lawrence B. Holder holder@cse.uta.edu Department of Computer Science Engineering Box 19015 University of Texas at Arlington Arlington, TX 76019 USA Abstract The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our Subdue substructure discovery system based on the minimum description length principle. The Subdue system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously discovered substructures in the data, multiple passes of Subdue produce a hierarchical description of the structural regularities in the data. Subdue uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by Subdue to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate Subdue's ability to find substructures capable of compressing the original data and to discover structural concepts important to the domain. dret_postscript.tar.gz ====================== Artificial-Life Simulators and Their Applications Howard Gutowitz Ecole Supérieure de Physique et de Chimie Industrielles Laboratoire d'Electronique 10 rue Vauquelin 75005 Paris, France and The Santa Fe Institute 1399 Hyde Park Road Santa Fe, NM 87501 hag@santafe.edu HTML: http://alife.santafe.edu/alife/topics/simulators/dret/dret.html Abstract: Artificial Life (Alife) is a rapidly growing field of scientific research linking biology and computer science. It seeks to understand how life-like processes can be embodied in computer programs. Advances in this area promise to illuminate fundamental questions both in biology ("What is life?") and in Computer Science ("How to make robust and adaptable computer programs?"). Much of the work in artificial life is directed toward building computer simulations of artificial creatures and the artificial worlds in which they live. This report surveys major efforts in this area, with attention to developments likely to lead to practical applications in the short to middle term. ep.ps.gz ====== Genetic Programming of Music Jeffrerey B. Putnam New Mexico Instute of Mining and Technology Aug 30th '94 model-of-landscapes.ps.gz ========================= Terry Jones February 1, 1994 Abstract The use of the term landscape" is increasing rapidly in the field of evolutionary computation, yet in many cases it remains poorly, if at all, defined. This situation has perhaps developed because every one grasps the imagery immediately, and the questions that would be asked of a less evocative term do not get asked. This paper presents a model of landscapes that is general enough to encompass most of what computer scientists would call search, though the model is not restricted to either the field or the viewpoint. It is particularly relevant to algorithms that employ some form of crossover, and hence to genetic algorithms and other members of the evolutionary computing family. An overview of the consequences and properties of the model establishes a connection with more traditional search algorithms from artificial intelligence, introduces the notion of a crossover landscape, and argues the importance of viewing search as navigation and structure. Santa Fe Institute, 1660 Old Pecos Trail, Suite A., Santa Fe, NM 87505, USA and Department of Computer Science, University of New Mexico, Albuquerque NM 87131, USA. Email: terry@san tafe.edu ppsn-94.ps.gz ============= Program Search with a Hierarchical Variable Length Representation: Genetic Programming, Simulated Annealing and Hill Climbing Una-May O`Reilly and Franz Oppacher Santa Fe Institute, Santa Fe, NM, 87505, USA School of Computer Science, Carleton University, Ottawa, CANADA Abstract. This paper presents a comparison of Genetic Programming(GP) with Simulated Annealing (SA) and Stochastic Iterated Hill Clim bing (SIHC) based on a suite of program discovery problems which have been previously tackled only with GP. All three search algorithms employ the hierarchical variable length representation for programs brough t into recent prominence with the GP paradigm [8]. We feel it is not intuitively obvious that mutation-based adaptive search can handle program discovery yet, to date, for each GP problem we have tried, SA or SIHC also work. ppsn92.ps.gz ============ An Experimental Perspective on Genetic Programming Una-May O`Reilly and Franz Oppacher Intelligent Systems Research Group, School of Computer Science, Carleton University, Ottawa, Ontario, K1S 5B6, CANADA Abstract Genetic Programming (GP) has recently been introduced by John R. Koza [5] as a method for genetically breeding populations of computer programs to solve problems. We believe GP to constitute a significant extension of the Genetic Algorithm (GA) research paradigm primarily because it generalizes the genetic search techniques: instead of looking for a solution to a specific instance of a problem, GP attempts to evolvea program capable of computing the solutions for any instance of the problem. We have implemented a genetic programming environment, GP*, that is capable of duplicating Koza`s experiments. In this paper we describe a specific GP experimenton the evolution of programs to sort vectors, and discuss the issues that must be addressed in any application of GP: the design of fitness functions and test suites, and the selection of program terminals and functions. Our observations point to several previously unnoticed shortcomings of the GP approach. We hypothesize that these shortcomings are due to the fact that GP only uses a hierarchical representation but does not construct its solutions in an explicitly hierarchical manner. oreilly ======= Una-May O'Reilly PhD thesis (An Analysis of Genetic Programming) rosca ===== Directory containing GP papers by J. P. Rosca ("Learning by Adapting Representations in GP etc.) terry ===== Papers by Terry Jones, including PhD thesis. (Model of Landscapes etc.) GPdata_icga-95.ps ================= Evolving Data Structures with Genetic Programming W. B. Langdon W.Langdon@cs.ucl.ac.uk Computer Science Dept. University College London It is established good software engineering practice to ensure that programs use memory via abstract data structures such as stacks, queues and lists. These provide an interface between the program and memory, freeing the program of memory management details which are left to the data structures to implement. The main result presented herein is that GP can automatically generate stacks and queues. Typically abstract data structures support multiple operations, such as put and get. We show that GP can simultaneously evolve all the operations of a data structure by implementing each such operation with its own independent program tree. That is, the chromosome consists of a fixed number of independent program trees. Moreover, crossover only mixes genetic material of program trees that implement the same operation. Program trees interact with each other only via shared memory and shared ``Automatically Defined Functions'' (ADFs). ADFs, ``pass by reference'' when calling them, Pareto selection, ``good software engineering practice'' and partitioning the genetic population into ``demes'' where also investigated whilst evolving the queue in order to improve the GP solutions. Code is available from this site WBL_aaai-pppGP.ps ================= Pareto Optimality, Population Partitioning, Price's theorem and GP W. B. Langdon W.Langdon@cs.ucl.ac.uk Computer Science Dept. University College London A description of a use of Pareto optimality in genetic programming is given and an analogy with Genetic Algorithm fitness niches is drawn. Techniques to either spread the population across many pareto optimal fitness values or to reduce the spread are described. It is speculated that a wide spread may not aid Genetic Programming. It is suggested that this might give useful insight into many GPs whose fitness is composed of several sub-objectives. The successful use of demic populations in GP leads to speculation that smaller evolutionary steps might aid GP in the long run. An example is given where Price's covariance theorem helped when designing a GP fitness function. directed_crossover.ps ===================== Summary of directing crossover locations in a multi-tree GP W. B. Langdon W.Langdon@cs.ucl.ac.uk Computer Science Dept. University College London GPlist_aigp2.ps =============== W. B. Langdon W.Langdon@cs.ucl.ac.uk Computer Science Dept. University College London Abstract for Advances in Genetic Programming 2 Much published work on genetic programming evolves functions without ``side-effects'' to learn patterns in test data, once produced they can be used to make predictions regarding new data. In contrast human written programs often make extensive and explicit use of memory. Indeed memory in some form is required for a programming system to be Turing Complete. In both normal and genetic programming considerable benefits have been found in adopting a ``structured approach''. For example, Koza has shown the introduction of evolvable code modules (ADFs) can greatly reduce the effort required for GP to reach a solution. Teller has shown that the introduction of indexed memory into genetic programming makes it Turing Complete. This work builds on Teller's and considers the introduction of data structures within GP. The expectation is that the use of data structures will prove as useful to GP as code structuring has already been shown to be. So far the important result is that GP can evolve simple data structures, such as stacks and queues. This abstract reports the successful evolution a list data structure, compares the use of Pareto selection and demes and describes the directed choice of crossover points. surveyRN76.ps =============== W. B. Langdon W.Langdon@cs.ucl.ac.uk A. Qureshi A.Qureshi@cs.ucl.ac.uk Computer Science Dept. University College London Genetic Programming -- Computers using "Natural Selection" to generate programs RN/95/76 Abstract Computers that ``program themselves''; science fact or fiction? Genetic Programming uses novel optimisation techniques to ``evolve'' simple programs; mimicking the way humans construct programs by progressively re-writing them. Trial programs are repeatedly modified in the search for ``better/fitter'' solutions. The underlying basis is Genetic Algorithms (GAs). Genetic Algorithms, pioneered by Holland, Goldberg and others, are evolutionary search techniques inspired by natural selection (i.e\ survival of the fittest). GAs work with a ``population'' of trial solutions to a problem, frequently encoded as strings, and repeatedly select the ``fitter'' solutions, attempting to evolve better ones. The power of GAs is being demonstrated for an increasing range of applications; financial, imaging, VLSI circuit layout, gas pipeline control and production scheduling. But one of the most intriguing uses of GAs - driven by Koza - is automatic program generation. Genetic Programming applies GAs to a ``population'' of programs - typically encoded as tree-structures. Trial programs are evaluated against a ``fitness function'' and the best solutions selected for modification and re-evaluation. This modification-evaluation cycle is repeated until a ``correct'' program is produced. GP has demonstrated its potential by evolving simple programs for medical signal filters, classifying news stories, performing optical character recognition, and for target identification. This paper surveys the exciting field of Genetic Programming. As a basis it reviews Genetic Algorithms and automatic program generation. Next it introduces Genetic Programming, describing its history and describing the technique via a worked example in C. Then using a taxonomy that divides GP researchs into theory/techniques and applications, it surveys recent work from both of these perspectives. Extensive bibliographies, glossaries and a resource list are included as appendices.