Based on slides presented at the Central London branch of the British Computer Society on 12 February 2004.

Research from EPSRC project GR/S03546/01. Project research papers.


Genetic Programming

W. B. Langdon
University College, London
EPSRC project with GlaxoSmithKline Data Exploration Sciences

Introduction

What is Genetic Programming

GP "Survival of the fittest" program

  1. Initial random populations of trial solutions
  2. Test individual programs
  3. Good enough? Stop
  4. Select better
  5. Breed new generation of programs with crossover and mutation
  6. Goto 2

GP Generational Cycle

GP Representation

GP Fitness Function

Creating new programs - Crossover

Some Applications of GP

Survey paper

Using GP within GlaxoSmithKline

Drug Discovery

Introduction to Drug Discovery

Discovery of a Drug to Treat a Disease

Drug Discovery 2

Computer models can guide chemists where to look first:
  1. Which existing chemicals to test next
  2. Models can make predictions for "virtual" chemicals, i.e. chemicals that do not exist. If model suggests virtual chemical looks promising it could be made

P450

Model of rat P450 2B1, showing mode of membrane attachment

P450 and Drug Discovery

GSK "Blind Trial"

P450 Datasets

Three P450 data sets

Continuous P450 measurement divided into three classes. Inhibitory, Substrate and Inactive. Three data sets:

P450 121 Features

GP P450 Prediction

Evolved Tree - predicts P450

%r14S.p500.2.50 fitness  3.36243e+07 hits 1838 len 51 %Tue Feb 25 17:57:33 2003
(ADD	(IFLTE f1
	             f2
	             (Max
		(ADD
		         (DIV
		               (SUB f3 20.33) 
		              0.8546)
		        (Min f4 25.58))
		f2)
	             (SUB
		(ADD
		     (DIV 493.1 f5)
		     (SUB f3 23.83))
		20.33))
	(Max
	             (Max
	                    (Max
	                          (Min
	                                (MaxA f4 -21.38)
	                               25.58)
	                          (IFLTE 0.6186 f6 2 55.27))
	                    (Min f7 f8))
	             (SUB
	                    (MUL 10.02 f2)
	                    14.04)))

Evolved Tree - predicts P450

F1, f2, ... f8 are GSK domain specific features calculated for each chemical from its chemical formula. These 8 features were chosen by GP from the 121 available.

GSK Workshop Recommendations

  • GP, NN and Logistic Regression all showed reasonable predictivity, but far from ideal. Explore further
  • Follow up on GP methods. Make technology available
  • GP produced a more easily understood model, using small number of features, that made sense to p450 modelling experts

DNA Chip + Data Fusion. Future?

  • DNA gene chip data most prominent of a stream of "long thin data". Thousands of data values for each a few score patients.
  • GP can be used to interpret such data.
  • Bioinformatics will need to integrate multiple data streams, Genes, Proteins, Biochemicals
  • GP offers data fusion of diverse bio data

Conclusions

  • Genetic programming can automatically evolve innovative solutions.
  • GP used in drug discovery to predict which potential molecules might become drugs long before testing is required. (GSK workshop GP top of 12)
    • More info. GP in handout

More information on GP

W.B.Langdon cs.ucl.ac.uk 12 March 2004 (last update 28 Oct 2023)