This is part of my work on Intelligent Data Analysis and Fusion Techniques in Pharmaceuticals, Bioprocessing and Process Control as part of the Rocket Faraday INTErSECT partnership project between UCL, GSK, Unilever, SPSS, NPL and Sira. (Follow up work at GSK.)
We use GP as the means to combine classifiers which have already been trained to some level of performance on molecule binding data mining problems. The classifiers can be of any type or mixtures of types. Indeed the classifiers can be trained on different data.
Initially GP starts with a random non-linear combinations of the supplied classifiers (possibly also the raw data they were trained on). Over generations of continuously selecting the better combinations from the population and creating new combinations, better classifiers are evolved.
The Receiver Operating Characteristics (ROC) of a classifier shows its performance as a trade off between selectivity and sensitivity. Better classifiers have a higher area under their ROC curve. The ideal classifier has an area of one.
The fitness of the non-linear combinations of classifiers the area under its ROC curve. This approach has been demonstrated by evolving improved data fusion classifiers for 1) contrived, 2) artificial and 3) several machine learning benchmarks. It has been tested in blind trials on QSAR drug activity datasets provided by GSK.
W.B.Langdon 9 October 2001