Abstract

We have developed and tested a genetic algorithm (GA) for pattern recognition, which identifies molecular descriptors that optimize the separation of the activity classes of olfactory stimulants in a plot of the two or three largest principal components of the data. Because principal components maximize variance, the bulk of the information encoded by these descriptors is about differences between olfactory classes in the dataset. In addition, the GA focuses on those classes and or samples that are difficult to classify as it trains using a form of boosting to modify the fitness landscape. Boosting minimizes the problem of convergence to a local optimum, because the fitness function of the GA is changing as the population is evolving toward a solution. Over time, compounds that consistently classify correctly are not as heavily weighted in the analysis as compounds that are difficult to classify. The pattern recognition GA learns its optimal parameters in a manner similar to a neural network. The algorithm integrates aspects of both strong and weak learning to yield a "smart" one-pass procedure for feature selection and classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call