Abstract

ABSTRACTFeature subset selection is an important preprocessing task for any real life data mining or pattern recognition problem. Evolutionary computational (EC) algorithms are popular as a search algorithm for feature subset selection. With the classification accuracy as the fitness function, the EC algorithms end up with feature subsets having considerably high recognition accuracy but the number of residual features also remain quite high. For high dimensional data, reduction of number of features is also very important to minimize computational cost of overall classification process. In this work, a wrapper fitness function composed of classification accuracy with another penalty term which penalizes for large number of features has been proposed. The proposed wrapper fitness function is used for feature subset evaluation and subsequent selection of optimal feature subset with several EC algorithms. The simulation experiments are done with several benchmark data sets having small to large number of features. The simulation results show that the proposed wrapper fitness function is efficient in reducing the number of features in the final selected feature subset without significant reduction of classification accuracy. The proposed fitness function has been shown to perform well for high-dimensional data sets with dimension up to 10,000.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call