Abstract

When constructing a predictive model, the process of limiting the number of features in order to remove unnecessary and redundant data and enhance learning accuracy is called feature selection. Traditionally, there are a few methods like variance threshold, Pearson correlation, F-score, etc., which can be used to lower the number of features for a particular dataset. But, dropping features using these methods does not give the best prediction score compared to the genetic algorithms where we see an improvement of nearly 3–5% in the prediction score. Methods like variance threshold, Pearson correlation, and F-score are based on formulas, whereas the genetic algorithm is a randomized search algorithm that mimics biologically inspired natural selection processes such as selection, cross-over, and mutation to find high-quality optimization solutions. This evolutionary algorithm is based on Charles Darwin’s natural selection theory, survival of the fittest. The fittest individuals are chosen to create offspring, according to natural selection theory where cross-over and mutation are used to pass on the features of the fittest parents to their offspring, ensuring a greater chance of survival. Here, the genetic algorithm is implemented for feature selection and the prediction scores are compared to the traditional methods on the same datasets for classification problems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call