Back ground for Genetic Programming for Combining Neural Networks for Drug Discovery

This page is my presentation of Genetic Programming for Combining Neural Networks for Drug Discovery by W. B. Langdon, S. J. Barrett and B. F. Buxton at WSC6, the online conference on Soft Computing methods.

What are Receiver Operating Characteristics?

ROCs are a nice way of showing the trade off all classifiers make between ensuring they do not miss positive cases against claiming negative examples are in fact positive (false alarms). See ROC summary.

Movie showing the evolution of individual classifiers within the GP population plotted in the ROC square.

What is the MRROC?

The MRROC stands for the Maximum Realisable Receiver Operating Characteristics. It is Martin Scott's name for the convex hull of the ROCs of the classifiers which have been combined. However in principle it is possible to find other ways of combining classifiers, whose ROC lies outside the convex hull (ie is better). In both toy, benchmark and some applications genetic programming can automatically evolve better classifiers.
See also.

What is Genetic Programming?

GP summary.

What is P450?

P450 is imprortant in drug discovery since its many forms exist in many parts of the body where they break down drugs. That is P-450 reduces the drugs' effectiveness. Hence it would be helpful to be able to predict in advance if a potential drug will interact with p450.

Model of rat P450 2B1, showing mode of membrane attachment (source article)

P450 model, showing membrane attachment

P450 sales article from Biolabs.

What is Finger Printing?

This Daylight document discusses chemical analysis such as finger printing extensively.

Briefly a chemical finger print is a way of condensing (compressing) a compound or molecule's chemical structure into a bit string. Only the (2-dimensional) linkage between atoms and their types need be known. (Ie not the full three dimensional structure). While a chemical structure need not be unique, the chances of any two large chemicals having the same finger print is very remote. Thus computer database searches and comparisons can be done rapidly using the finger print. Ward's linkage Daylight

What is Tanimoto Distance?

Briefly the Tanimoto Distance is used to see how similar two chemicals are. Approximately it does this by counting the number of chemical substructures or chemical groups they have in common. The nice thing is it works on the chemical finger print of the two chemicals and so is fast. The distance is given by the ratio between the number of groups that are occur in both, divided by this plus the number in only one, plus the number only in the other. (The number which occur in neither is ignored).

What is Hamming Distance?

Hamming distance is the number of bits that are different. Eg the Hamming distance between 13 and 9 is 1. This is because 13 = 1101 in binary, while 9 = 1001.

Note the Hamming distance betweem A and B is the same as that between B and A.

What is the SMILES Representation?

Smiles is convention for describing chemical compounds. (SMILES is an acronym, meaning Simplified Molecular Input Line Entry Specification). See for much more information.

What is Clementine?

Clementine is a data mining tool. Click here for partial screen shot of it training 60 nueral networks.

What is size fair crossover?

Click here.

Possibly Useful Additional References

On the Recognition of Mammalian Microsomal Cytochrome P450 substrates and Their Characteristics", David F. V. Lewis, Biochemical Pharmacology, vol 60, pp-293-306, 2000

Ward, J.H. (1963), "Hierarchical Grouping to Optimize an Objective Function," Journal of the American Statistical Association, 58, 236-244.


back

W.B.Langdon 30 August 2001
(last update 12 Jan 2003)