Advances in Genetic Programming: Volume 2 PART III

Advances in Genetic Programming: Volume 2 MIT Press ISBN 0-262-01158-1, 538 pp. $50.00, hardbound. MIT Press 1996 Fall catalog, order code ANGAH2, now available.

edited by Peter J. Angeline Kenneth E. Kinnear, Jr.

III ANALYSIS AND IMPLEMENTATION ISSUES IN GENETIC PROGRAMMING 257
13 Efficiently Representing Populations in Genetic Programming 259
Maarten Keijzer
The chapter compares two representations for genetic programming. One is the commonly used Lisp S-Expression which uses the problem specific terminals and functions defined before a run as an alphabet. The other is a minimal Directed Acyclic Graph (DAG) that uses a variable alphabet of complete subtrees. This chapter will show that the DAG representation can replace S-Expression representation without any change in the functionality of a genetic programming system. In certain situations the amount of memory needed to represent a population can be reduced enormously when using a DAG. The implementation of Automatically Defined Functions (ADFs) in a DAG gives rise to the definitions of a divergent ADF, and a compact ADF. The latter can represent huge programs in S-Expression format with a few elements.
14 Genetically Optimizing The Speed of Programs Evolved to Play Tetris 279
Eric V. Siegel, Alexander D. Chaffee
Many new domains for genetic programming require evolved programs to be executed for longer amounts of time. For these applications it is likely that some test cases optimally require more computation cycles than others. Therefore, programs must dynamically allocate cycles among test cases in order to use computation time efficiently. To elicit the strategic allocation of computation time, we impose an aggregate computation time ceiling that applies over a series of fitness cases. This exerts time pressure on evolved programs, with the effect that resulting programs dynamically allocate computation time, opportunistically spending less time per test case when possible, with minimal damage to domain performance. This technique is in principle extensible to resources other than computation time such as memory or fuel. We introduce the game Tetris as a test problem for this technique.
15 The Royal Tree Problem, a Benchmark for Single and Multiple Population Genetic Programming 299
William F. Punch, Douglas Zongker, Erik D. Goodman
We have previously shown how a genetic algorithm (GA) can be used to perform "data mining," the discovery of particular/important data within large datasets, by finding optimal data classifications using known examples. However, these approaches, while successful, limited data relationships to those that were "fixed" before the GA run. We report here on an extension of our previous work, substituting a genetic program (GP) for a GA. The GP could optimize data classification, as did the GA, but could also determine the functional relationships among the features. This gave improved performance and new information on important relation ships among features. We discuss the overall approach, and compare the effectiveness of the GA vs. GP on a biochemistry problem, the determination of the involvement of bound water molecules in protein interactions.
16 Parallel Genetic Programming: A Scalable Implementation Using The Transputer Network Architecture 317
David Andre, John R. Koza
This chapter describes the parallel implementation of genetic programming in the C programming language using a PC type computer (running Windows) acting as a host and a network of processing nodes using the transputer architecture. Using this approach, researchers of genetic algorithms and genetic programming can acquire computing power that is intermediate between the power of currently available workstations and that of supercomputers at a cost that is intermediate between the two. This approach is illustrated by a comparison of the computational effort required to solve the problem of symbolic regression of the Boolean even-5-parity function with different migration rates. Genetic programming required the least computational effort with an 5% migration rate. Moreover, this computational effort was less than that required for solving the problem with a serial computer and a panmictic population of the same size. That is, apart from the nearly linear speed-up in executing a fixed amount of code inherent in the parallel implementation of genetic programming, the use of distributed sub-populations with only limited migration delivered more than linear speed-up in solving the problem.
17 Massively Parallel Genetic Programming 339
Hugues Juille, Jordan B. Pollack
As the field of Genetic Programming (GP) matures and its breadth of application increases, the need for parallel implementations becomes absolutely necessary. The transputer-based system presented in [Koza95] is one of the rare such parallel implementations. Until today, no implementation has been proposed for parallel GP using a SIMD architecture, except for a data-parallel approach [tufts95], although others have exploited workstation farms and pipelined supercomputers. One reason is certainly the apparent difficulty of dealing with the parallel evaluation of different S-expressions when only a single instruction can be executed at the same time on every processor. The aim of this chapter is to present such an implementation of parallel GP on a SIMD system, where each processor can efficiently evaluate a different S-expression. We have implemented this approach on a MasPar MP-2 computer, and will present some timing results. To the extent that SIMD machines, like the MasPar are available to offer cost-effective cycles for scientific experimentation, this is a useful approach.
18 Type Inheritance in Strongly Typed Genetic Programming 359
Thomas D. Haynes, Dale A. Schoenefeld, Roger L. Wainwright
Genetic Programming (GP) is an automatic method for generating computer programs, which are stored as data structures and manipulated to evolve better programs. An extension restricting the search space is Strongly Typed Genetic Programming (STGP), which has, as a basic premise, the removal of closure by typing both the arguments and return values of functions, and by also typing the terminal set. A restriction of STGP is that there are only two levels of typing. We extend STGP by allowing a type hierarchy, which allows more than two levels of typing
19 On Using Syntactic Constraints with Genetic Programming 377
Frederic Gruau
20 Data Structures and Genetic Programming 395
William B. Langdon
In real world applications, software engineers recognise the use of memory must be organised via data structures and that software using the data must be independant of the data structures' implementation details. They achieve this by using abstract data structures, such as records, files and buffers.
We demonstrate that genetic programming can automatically implement simple abstract data structures, considering in detail the task of evolving a list. We show general and reasonably efficient implementations can be automatically generated from simple primitives.
A model for maintaining evolved code is demonstrated using the list problem.

Advances in Genetic Programming: Volume 2 PART IV PART II

W.B.Langdon 23 March 2001 (3 Jan 2007)

III	ANALYSIS AND IMPLEMENTATION ISSUES IN GENETIC PROGRAMMING	257
13	Efficiently Representing Populations in Genetic Programming	259
	Maarten Keijzer The chapter compares two representations for genetic programming. One is the commonly used Lisp S-Expression which uses the problem specific terminals and functions defined before a run as an alphabet. The other is a minimal Directed Acyclic Graph (DAG) that uses a variable alphabet of complete subtrees. This chapter will show that the DAG representation can replace S-Expression representation without any change in the functionality of a genetic programming system. In certain situations the amount of memory needed to represent a population can be reduced enormously when using a DAG. The implementation of Automatically Defined Functions (ADFs) in a DAG gives rise to the definitions of a divergent ADF, and a compact ADF. The latter can represent huge programs in S-Expression format with a few elements.
14	Genetically Optimizing The Speed of Programs Evolved to Play Tetris	279
	Eric V. Siegel, Alexander D. Chaffee Many new domains for genetic programming require evolved programs to be executed for longer amounts of time. For these applications it is likely that some test cases optimally require more computation cycles than others. Therefore, programs must dynamically allocate cycles among test cases in order to use computation time efficiently. To elicit the strategic allocation of computation time, we impose an aggregate computation time ceiling that applies over a series of fitness cases. This exerts time pressure on evolved programs, with the effect that resulting programs dynamically allocate computation time, opportunistically spending less time per test case when possible, with minimal damage to domain performance. This technique is in principle extensible to resources other than computation time such as memory or fuel. We introduce the game Tetris as a test problem for this technique.
15	The Royal Tree Problem, a Benchmark for Single and Multiple Population Genetic Programming	299
	William F. Punch, Douglas Zongker, Erik D. Goodman We have previously shown how a genetic algorithm (GA) can be used to perform "data mining," the discovery of particular/important data within large datasets, by finding optimal data classifications using known examples. However, these approaches, while successful, limited data relationships to those that were "fixed" before the GA run. We report here on an extension of our previous work, substituting a genetic program (GP) for a GA. The GP could optimize data classification, as did the GA, but could also determine the functional relationships among the features. This gave improved performance and new information on important relation ships among features. We discuss the overall approach, and compare the effectiveness of the GA vs. GP on a biochemistry problem, the determination of the involvement of bound water molecules in protein interactions.
16	Parallel Genetic Programming: A Scalable Implementation Using The Transputer Network Architecture	317
	David Andre, John R. Koza This chapter describes the parallel implementation of genetic programming in the C programming language using a PC type computer (running Windows) acting as a host and a network of processing nodes using the transputer architecture. Using this approach, researchers of genetic algorithms and genetic programming can acquire computing power that is intermediate between the power of currently available workstations and that of supercomputers at a cost that is intermediate between the two. This approach is illustrated by a comparison of the computational effort required to solve the problem of symbolic regression of the Boolean even-5-parity function with different migration rates. Genetic programming required the least computational effort with an 5% migration rate. Moreover, this computational effort was less than that required for solving the problem with a serial computer and a panmictic population of the same size. That is, apart from the nearly linear speed-up in executing a fixed amount of code inherent in the parallel implementation of genetic programming, the use of distributed sub-populations with only limited migration delivered more than linear speed-up in solving the problem.
17	Massively Parallel Genetic Programming	339
	Hugues Juille, Jordan B. Pollack As the field of Genetic Programming (GP) matures and its breadth of application increases, the need for parallel implementations becomes absolutely necessary. The transputer-based system presented in [Koza95] is one of the rare such parallel implementations. Until today, no implementation has been proposed for parallel GP using a SIMD architecture, except for a data-parallel approach [tufts95], although others have exploited workstation farms and pipelined supercomputers. One reason is certainly the apparent difficulty of dealing with the parallel evaluation of different S-expressions when only a single instruction can be executed at the same time on every processor. The aim of this chapter is to present such an implementation of parallel GP on a SIMD system, where each processor can efficiently evaluate a different S-expression. We have implemented this approach on a MasPar MP-2 computer, and will present some timing results. To the extent that SIMD machines, like the MasPar are available to offer cost-effective cycles for scientific experimentation, this is a useful approach.
18	Type Inheritance in Strongly Typed Genetic Programming	359
	Thomas D. Haynes, Dale A. Schoenefeld, Roger L. Wainwright Genetic Programming (GP) is an automatic method for generating computer programs, which are stored as data structures and manipulated to evolve better programs. An extension restricting the search space is Strongly Typed Genetic Programming (STGP), which has, as a basic premise, the removal of closure by typing both the arguments and return values of functions, and by also typing the terminal set. A restriction of STGP is that there are only two levels of typing. We extend STGP by allowing a type hierarchy, which allows more than two levels of typing
19	On Using Syntactic Constraints with Genetic Programming	377
	Frederic Gruau
20	Data Structures and Genetic Programming	395
	William B. Langdon In real world applications, software engineers recognise the use of memory must be organised via data structures and that software using the data must be independant of the data structures' implementation details. They achieve this by using abstract data structures, such as records, files and buffers. We demonstrate that genetic programming can automatically implement simple abstract data structures, considering in detail the task of evolving a list. We show general and reasonably efficient implementations can be automatically generated from simple primitives. A model for maintaining evolved code is demonstrated using the list problem.