W B Langdon's Papers and Abstracts before 2003

W.B.Langdon . 11 Aug 2014 (full list)

How many Good Programs are there? How Long are they?

W. B. Langdon (PDF, ps.gz). Presented at FOGA 2002, Kenneth A. De Jong and Riccardo Poli and Jonathan E. Rowe (editors) pp183-202, Morgan Kaufmann.
Follows up GECCO'2002 paper

Two page version presented at BNAIC 2002 (PDF, ps.gz) Summary

ABSTRACT

We model in detail the distribution of Boolean functions implemented by random non-recursive programs, similar to linear genetic programming (GP). Most functions are constants, the remainder are mostly simple.

Bounds on how long programs need to be before the distribution of their functionality is close to its limiting distribution are provided in general and for average computers.

Results for a model like genetic programming are experimentally tested.

Bibliographic details

Convergence Rates for the Distribution of Program Outputs

W. B. Langdon, in GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, New York, pp812-819, Morgan Kaufmann, 2002. (PDF, ps.gz). Slides Presented at GECCO'2002

Two page version presented at BNAIC 2002 (PDF, ps.gz)

ABSTRACT

Fitness distributions (landscapes) of programs tend to a limit as they get bigger. Markov chain convergence theorems give general upper bounds on the linear program sizes needed for convergence. Tight bounds (exponential in N, N log N and smaller) are given for five computer models (any, average, cyclic, bit flip and Boolean). Mutation randomizes a genetic algorithm population in 1/4(l+1)(log(l)+4) generations. Results for a genetic programming (GP) like model are confirmed by experiment.

Bibliographic details

Combining Decision Trees and Neural Networks for Drug Discovery

W. B. Langdon and S. J. Barrett and B. F. Buxton. In Genetic Programming, Proceedings of the 5th European Conference EuroGP'2002, James A. Foster and Evelyne Lutton and Julian Miller and Conor Ryan and Andrea G. B. Tettamanzi (editors), Ireland, LNCS 2278, pp60-70, Springer-Verlag, 2002. PDF compressed postscript.

ABSTRACT

Genetic programming (GP) offers a generic method of automatically fusing together classifiers using their receiver operating characteristics (ROC) to yield superior ensembles. We combine decision trees (C4.5) and artificial neural networks (ANN) on a difficult pharmaceutical data mining (KDD) drug discovery application. Specifically predicting inhibition of a P450 enzyme. Training data came from high throughput screening (HTS) runs. The evolved model may be used to predict behaviour of virtual (i.e. yet to be manufactured) chemicals. Measures to reduce over fitting are also described.

Bibliographic details

Genetic Programming for Combining Neural Networks for Drug Discovery

W. B. Langdon and S. J. Barrett and B. F. Buxton, in Soft Computing and Industry Recent Applications Rajkumar Roy, Mario Koppen, Seppo Ovaska, Takeshi Furuhashi, Frank Hoffmann (editors), pages 597-608, Springer-Verlag, 2002. Presented at WSC6 (pdf). WSC6 presentation.

ABSTRACT

We have previously shown on a range of benchmarks [Langdon,2001] Genetic programming (GP) can automatically fuse given classifiers of diverse types to produce a combined classifier whose Receiver Operating Characteristics (ROC) are better than [Scott et al., 1998 BMVC]'s "Maximum Realisable Receiver Operating Characteristics" (MRROC). I.e. better than their convex hull. Here our technique is used in a blind trial where artificial neural networks are trained by Clementine on P450 pharmaceutical data. Using just the networks GP automatically evolves a composite classifier.

Bibliographic details

Genetic Programming for Improved Receiver Operating Characteristics

W. B. Langdon and B. F. Buxton. In Josef Kittler and Fabio Roli editors, Second International Conference on Multiple Classifier System, LNCS 2096, pages 68-77, Cambridge, 2001. MCS 2001 (PDF, gzipped postscript).

ABSTRACT

Genetic programming (GP) can automatically fuse given classifiers of diverse types to produce a combined classifier whose Receiver Operating Characteristics (ROC) are better than [Scott et al., 1998 BMVC]'s ``Maximum Realisable Receiver Operating Characteristics'' (MRROC). I.e. better than their convex hull. This is demonstrated on a satellite image processing bench mark using Naive Bayes, Decision Trees (C4.5) and Clementine artificial neural networks .

Bibliographic details

Genetic Programming for Combining Classifiers

W. B. Langdon and B. F. Buxton. Presented at GECCO'2001, pp 66-73, Morgan Kaufmann. (gzipped postscript, PDF).

ABSTRACT

Genetic programming (GP) can automatically fuse given classifiers to produce a combined classifier whose Receiver Operating Characteristics (ROC) are better than [Scott et al., 1998 BMVC]'s ``Maximum Realisable Receiver Operating Characteristics'' (MRROC). I.e. better than their convex hull. This is demonstrated on artificial, medical and satellite image processing bench marks.

Bibliographic details

Genetic Programming Bloat without Semantics

(gzipped postscript) Poster presented at PPSN'2000

(C++ code)

ABSTRACT

To investigate the fundamental causes of bloat, six artificial random binary tree search spaces are presented. Fitness is given by program syntax (the genetic programming genotype). GP populations are evolved on both random problems and problems with ``building blocks''. These are compared to problems with explicit ineffective code (introns, junk code, inviable code). Our results suggest the entropy random walk explanation of bloat remains viable. The hard building block problem might be used in further studies, e.g. of standard subtree crossover.

Bibliographic details

Evolving Receiver Operating Characteristics for Data Fusion

William B. Langdon and Bernard F. Buxton (zipped postscript). Slides presented at EuroGP'2001.

ABSTRACT

It has been suggested that the ``Maximum Realisable Receiver Operating Characteristics'' for a combination of classifiers is the convex hull of their individual ROCs [Scott et al., 1998 BMVC]. As expected in at least some cases better ROCs can be produced. We show genetic programming (GP) can automatically produce a combination of classifiers whose ROC is better than the convex hull of the supplied classifier's ROCs.

Bibliographic details

Evolving Hand-Eye Coordination for a Humanoid Robot with Machine Code Genetic Programming

William B. Langdon and Peter Nordin (gzipped postscript). DOI:10.1007/3-540-45355-5_25 Presented at EuroGP'2001 (Elvis movie and summary). (springer)

ABSTRACT

We evolve, using AIMGP machine code genetic programming, Discipulus, an approximation of the inverse kinematics of a real robotics arm with many degrees of freedom. Elvis is a bipedal robot with human-like geometry and motion capabilities - a humanoid, primarily controlled by evolutionary adaptive methods. The GP system produces a useful inverse kinematic mapping, from target 3-D points (via pairs of stereo video images) to a vector of arm controller actuator set points.

Bibliographic details

Natural Language Text Classification and Filtering with Trigrams and Evolutionary Nearest Neighbour Classifiers

CWI Report SEN-R0022 (Also late breaking paper presented at GECCO'2000

ABSTRACT

N grams offer fast language independent multi-class text categorization. Text is reduced in a single pass to ngram vectors. These are assigned to one of several classes by a) nearest neighbor (KNN) and b) genetic algorithm operating on weights in a nearest neighbour classifier. 91% accuracy is found on binary classification on short multi-author technical English documents. This falls if more categories are used but 69% is obtained with 8 classes.

Zipf law is found not to apply to trigrams.

Quadratic Bloat in Genetic Programming

Gzipped postscript html. Presented at GECCO'2000

ABSTRACT

In earlier work we predicted program size would grow in the limit at a quadratic rate and up to fifty generations we measured bloat O(generations**(1.2-1.5)). On two simple benchmarks we test the prediction of bloat O(generations**2.0) up to generation 600. In continuous problems the limit of quadratic growth is reached but convergence in the discrete case limits growth in size. Measurements indicate subtree crossover ceases to be disruptive with large programs (1,000,000) and the population effectively converges (even though variety is near unity). Depending upon implementation, we predict run time O(number of generations**(2.0-3.0)) and memory O(number of generations**(1.0-2.0)).

Bibliographic details

Seeding Genetic Programming Populations

EuroGP'2000 (gzipped postscript).
Poster

ABSTRACT

We show genetic programming (GP) populations can evolve under the influence of a Pareto multi-objective fitness and program size selection scheme, from ``perfect'' programs which match the training material to general solutions. The technique is demonstrated with programmatic image compression, two machine learning benchmark problems (Pima Diabetes and Wisconsin Breast Cancer) and an insurance customer profiling task (Benelearn99 data mining).

Bibliographic details

Size Fair and Homologous Tree Genetic Programming Crossovers

In the Journal of Genetic Programming and Evolutionary Machines, volume 1 number 1/2, pp 95-119, April 2000. (pdf gzip ps)
Presented at GECCO'99
Long version as CWI technical report SEN-R9907 (23 pages)
Short version presented at BNAIC 99 (2 pages)

ABSTRACT

Size fair and homologous crossover genetic operators for tree based genetic programming are described and tested. Both produce considerably reduced increases in program size and no detrimental effect on GP performance. GP search spaces are partitioned by the ridge in the number of program versus their size and depth. A ramped uniform random initialisation is described which straddles the ridge. With subtree crossover trees increase about one level per generation leading to sub-quadratic bloat in length.

W B Langdon's Papers and Abstracts before 2003

How many Good Programs are there? How Long are they?

Convergence Rates for the Distribution of Program Outputs

Combining Decision Trees and Neural Networks for Drug Discovery

Genetic Programming for Combining Neural Networks for Drug Discovery

Genetic Programming for Improved Receiver Operating Characteristics

Genetic Programming for Combining Classifiers

Genetic Programming Bloat without Semantics

Evolving Receiver Operating Characteristics for Data Fusion

Evolving Hand-Eye Coordination for a Humanoid Robot with Machine Code Genetic Programming

Natural Language Text Classification and Filtering with Trigrams and Evolutionary Nearest Neighbour Classifiers

Quadratic Bloat in Genetic Programming

Seeding Genetic Programming Populations

Size Fair and Homologous Tree Genetic Programming Crossovers

Genetic Programming and Evolvable Machines: Books and other Resources

Genetic Programming Approach to Benelearn 99: I

Linear Increase in Tree Height Leads to Sub-Quadratic Bloat

Scaling of Program Tree Fitness Spaces

The evolution of size and shape

Why "Building Blocks" Don't Work on Parity Problems

Boolean Functions Fitness Spaces

Genetic Programming in Europe

Better Trained Ants for Genetic Programming

Better Trained Ants

Why Ants are Hard

Genetic Programming Bloat with Dynamic Fitness

Program Growth in Simulated Annealing, Hill Climbing and Populations

Fitness Causes Bloat: Simulated Annealing, Hill Climbing and Populations

Fitness Causes Bloat: Mutation

Fitness Causes Bloat in Variable Size Representation

Fitness Causes Bloat

An Analysis of the MAX Problem in Genetic Programming

Evolution of Genetic Programming Populations

Using Data Structures within Genetic Programming

Data Structures and Genetic Programming

A Bibliography for Genetic Programming

Genetic Programming -- Computers using "Natural Selection" to generate programs

Grow Your Own Programs (hard copy only)

Evolving Data Structures with Genetic Programming

Scheduling Maintenance of Electrical Power Transmission

Scheduling Maintenance of Electrical Power Transmission Networks Using Genetic Algorithms

Scheduling Maintenance of Electrical Power Transmission Networks Using Genetic Programming

Scheduling Planned Maintenance of the South Wales High-Voltage Regional Electricity Network

Scheduling Planned Maintenance of the National Grid

Pareto Optimality, Population Partitioning, Price's theorem and Genetic Programming

Directed Crossover within Genetic Programming

Summary of NK landscapes and GAs

Summary of Seeding Population GAs

Quick Intro to simple-gp.c