Abstract

The presence of unimportant and superfluous features in datasets motivates researchers to devise novel feature selection strategies. The problem of feature selection is multi-objective in nature and hence optimizing feature subsets with respect to any single evaluation criteria is not sufficient [1]. Moreover, discovering a single best subset of features is not of much interest. In fact, finding several feature subsets reflecting a trade off among several objective criteria is more beneficial as it provides the users a broad choice for feature subset selection. Thus, in order to combine several feature selection criteria, we propose multi-objective optimization of feature subsets using Multi-Objective Genetic Algorithm. This work is an attempt to discover non-dominated feature subsets of smaller cardinality with high predictive power and least redundancy. To meet this purpose we have used NSGA II, a well known Multi-objective Genetic Algorithm (MOGA), for discovering non-dominated feature subsets for the task of classification. The main contribution of this paper is the design of a novel multi-objective fitness function consisting of information gain, mutual correlation and size of the feature subset as the multi-optimization criteria. The suggested approach is validated on seven datasets from the UCI machine learning repository. Support Vector Machine, a well tested classification algorithm is used to measure the classification accuracy. The results confirm that the proposed system is able to discover diverse optimal feature subsets that are well spread in the overall feature space and the classification accuracy of the resulting feature subsets is reasonably high.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call