PH.D. DISSERTATION AVAILABLE: CENG Technical Report 94-13 Recombination, Selection, and the Genetic Construction of Computer Programs Walter Alden Tackett Computer Engineering Division Department of Electrical Engineering - Systems University of Southern California Los Angeles, California, USA 90089-2562 Bound copies are available from the University for $25 apiece, and will be available from University Microfilms as well. An anonymous ftp version is available from santafe.edu in the directory pub/Users/tackett/phd, and will also be available from the Genetic Programming archives at ftp.cc.utexas.edu. Other sites TBD. File information and abstract are given below. The directory contains five files (in theory): README (this file) watphd_1.ps.Z watphd_2.ps.Z watphd_3.ps.Z - The dissertation is broken up into three equal-sized parts in compressed postscript format, suitable for unix, Mac, etc. The postscript was generated with a MSWindows driver for Adobe Postscript v3.0 cartridge on an HP LaserJet IIP. Filtered via dos2unix -ascii and compressed with the Unix compress command. wtphd_mw.zip - Same three parts, in original form as .doc files generated by MSWord 6.0a under Windows/NT, and compressed/archived via pkzip -ex. Use this if you have Word 6 (sorry to hear that), not compatible with earlier versions of Word or with Word on the Mac. Abstract: Recombination, Selection, and the Genetic Construction of Computer Programs Walter Alden Tackett University of Southern California Abstract Computational intelligence seeks as a basic goal to create artificial systems which mimic aspects of biological adaptation, behavior, perception, and reasoning. Toward that goal, genetic program induction - "Genetic Programming" - has succeeded in automating an activity traditionally considered to be the realm of creative human endeavor. It has been applied successfully to the creation of computer programs which solve a diverse set of model problems. This naturally leads to questions such as: * Why does it work? * How does it fundamentally differ from existing methods? * What can it do that existing methods cannot? The research described here seeks to answer those questions through investigations on several fronts. Analysis is performed which shows that Genetic Programming has a great deal in common with heuristic search, long studied in the field of Artificial Intelligence. It introduces a novel aspect to that method in the form of the recombination operator which generates successors by combining parts of favorable strategies. On another track, we show that Genetic Programming is a powerful tool which is suitable for real-world problems. This done first by applying it to an extremely difficult induction problem and measuring performance against other state-of-the-art methods. We continue by formulating a model induction problem which not only captures the pathologies of the real world, but also parameterizes them so that variation in performance can be measured as a function of confounding factors. At the same time, we study how the properties of search can be varied through the effects of the selection operator. Combining the lessons of the search analysis with known properties of biological systems leads to the formulation of a new recombination operator which is shown to improve induction performance. In support of the analysis of selection and recombination, we define problems in which structure is precisely controlled. These allow fine discrimination of search performance which help to validate analytic predictions. Finally, we address a truly unique aspect of Genetic Programming, namely the exploitation of symbolic procedural knowledge in order to provide "explanations" from genetic programs.