Back ground for
Genetic Programming for Combining Neural Networks for Drug Discovery
This page is my presentation of
Genetic Programming for Combining Neural Networks for Drug Discovery
by
W. B. Langdon,
S. J. Barrett and
B. F. Buxton
at
WSC6,
the online conference on Soft Computing methods.
What are Receiver Operating Characteristics?
ROCs are a nice way of showing the trade off all classifiers make between
ensuring they do not miss positive cases against
claiming negative examples are in fact positive (false alarms).
See ROC summary.
Movie
showing the evolution of individual classifiers within the GP
population plotted in the ROC square.
What is the MRROC?
The MRROC stands
for the Maximum Realisable Receiver Operating Characteristics.
It is Martin Scott's name for the convex hull of the ROCs
of the classifiers which have been combined.
However in principle it is possible to find other ways of combining
classifiers, whose ROC lies outside the convex hull
(ie is better).
In both toy, benchmark and some applications genetic programming can
automatically evolve better classifiers.
See
also.
What is Genetic Programming?
GP summary.
What is P450?
P450 is imprortant in drug discovery since its many forms exist in
many parts of the body where they break down drugs.
That is P-450 reduces the drugs' effectiveness.
Hence it would be helpful to be able to predict in advance if a
potential drug will interact with p450.
Model of rat P450 2B1, showing mode of membrane
attachment
(source
article)
P450 sales
article from Biolabs.
What is Finger Printing?
This Daylight
document
discusses chemical analysis such as finger printing extensively.
Briefly a
chemical finger print is a way of condensing (compressing)
a compound or molecule's chemical structure into a bit string.
Only the (2-dimensional) linkage between atoms and their types need
be known. (Ie not the full three dimensional structure).
While a chemical structure need not be unique, the chances of any two
large chemicals having the same finger print is very remote.
Thus computer database searches and comparisons can be done rapidly
using the finger print.
Ward's linkage
Daylight
What is Tanimoto Distance?
Briefly the
Tanimoto Distance is used to see how similar two chemicals are.
Approximately it does this by counting the number of chemical
substructures or chemical groups they have in common.
The nice thing is it works on the
chemical finger print
of the two chemicals and so is fast.
The distance is given by the ratio between the number of groups that
are occur in both, divided by this plus the number in only one,
plus the number only in the other.
(The number which occur in neither is ignored).
What is Hamming Distance?
Hamming
distance is the number of bits that are different.
Eg the Hamming distance between 13 and 9 is 1.
This is because 13 = 1101 in binary, while 9 = 1001.
Note the Hamming distance betweem A and B is the same as that between
B and A.
What is the SMILES Representation?
Smiles is convention for describing chemical compounds.
(SMILES is an acronym, meaning
Simplified Molecular Input Line Entry Specification).
See for much more information.
What is Clementine?
Clementine
is a data mining tool.
Click here for partial screen shot
of it training 60 nueral networks.
What is size fair crossover?
Click here.
Possibly Useful Additional References
On the Recognition of Mammalian Microsomal Cytochrome P450 substrates
and Their Characteristics",
David F. V. Lewis, Biochemical Pharmacology, vol 60, pp-293-306, 2000
Ward, J.H. (1963),
"Hierarchical Grouping to Optimize an Objective Function,"
Journal of the American Statistical Association, 58, 236-244.
back
W.B.Langdon
30 August 2001
(last update 12 Jan 2003)