Abstract

A genetic algorithm (GA) for pattern recognition analysis of multivariate chemical data is described. The GA selects features that optimize the separation of the classes in a plot of the two largest principal components (PCs) of the data. Because the largest PCs capture the bulk of the variance in the data, the features chosen by the GA convey information primarily about differences between the classes in a data set. Hence, the principal component analysis (PCA) routine embedded in the fitness function of the GA acts as an information filter, significantly reducing the size of the search space, because it restricts the search to feature sets whose PC plots show clustering on the basis of class. In addition, the algorithm focuses on those classes and/or samples that are difficult to classify as it trains by boosting the class and sample weights. Samples that are consistently classified correctly are not as heavily weighted in the analysis as samples that are difficult to classify. Over time, the algorithm learns its optimal parameters in a manner similar to a neural network. The pattern recognition GA integrates aspects of artificial intelligence and evolutionary computations to yield a ‘smart’ one-pass procedure for feature selection and classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call