Abstract

Feature selection is to reduce both the dimensionality of data and the classification error rate (i.e., increase the classification accuracy) of a learning algorithm. The two objectives are often conflicting, thus a multiobjective feature selection method can obtain a set of nondominated feature subsets. Each solution in the set has a different size and a corresponding classification error rate. However, most existing feature selection algorithms have ignored that, for a given size, there can be different feature subsets with very similar or the same accuracy. This article introduces a niching-based multiobjective feature selection method that simultaneously minimizes the number of selected features and the classification error rate. The proposed method conceives to identify: 1) a set of feature subsets with good convergence and distribution and 2) multiple feature subsets choosing the same number of features with almost the same lowest classification error rate. The contributions of this article are threefold. First, a niching and global interaction mutation operator is proposed that can produce promising feature subsets. Second, a newly developed environmental selection mechanism allows equal informative feature subsets to be stored by relaxing the Pareto-dominance relationship. Finally, the proposed subset repairing mechanism can generate better feature subsets and further remove the redundant features. The proposed method is compared against seven multiobjective feature selection algorithms on 19 datasets, including both binary and multiclass classification tasks. The results show that the proposed method can evolve a rich and diverse set of nondominated solutions for different feature selection tasks, and their availability helps in understanding the relationships between features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call