Genetic Programming for Combining P450 Activity Predictions

W. B. Langdon, S. J. Barrett and B. F. Buxton, WSC6


Part of Faraday data fusion project.

What is The Problem?

P450 is important in drug design but it is hard to predict exactly if it will interfere with the potential drug activity of a new compound. Existing approaches require detailed knowledge (such as molecular, electronic, physico-chemical (eg Free-Energy) properties). Since such properties need not be available, especially when using high-through-put screening, These approaches are not viable in-conjunction with HTS. (Busy HTS web site).

In high through put screening minute quantities of chemicals are placed mechanically in a tiny holes (known as wells) on a tray. Trays typically have at least 96 wells (sometimes many more). Activity between the chemicals is indicated by set in it up to produce a chemical dye which fluorescence. The amount of reaction between the chemicals is measured by measuring the brightness of the well after shining light upon to it. NB the whole process is heavily automated, allowing many different chemicals to be studied simultaneously.

Instead we extend and combine existing data mining techniques to the P450 problem. Many classifiers are trained on HTS data. They are then fused using genetic programming to yield a composite predictor.


Figure 3 shows the evolved combined classifier produced by genetic programming is better than all of the 60 trained neural networks. It is better than ROCs of any of the 60 neural networks. In fact Figure 4 shows its ROC is better than the convex hull of them all together. (The movie shows the evolution of the ROCs in the GP population in a similar run).

Graph (purple) shows size fair crossover and the 4 mutation operators have succeeded in controlling bloat.

The evolved classifier and how it used is described here.

Possibly helpful Information

click here 30 August 2001 (last update 4 Oct 2012).