Abstract

Modern data collection technologies may produce thousands of or even more features in a single dataset. The high dimensionality of data poses a barrier to determining discriminating features due to the curse of dimensionality. Thanks to the global search ability, many population-based feature selection approaches have been proposed. However, very few studies pay attention on that a feature selection task has multiple optimal feature subsets. To search for multiple optimal feature subsets, we propose a feature clustering-assisted feature selection method. The proposed method employs the knowledge of correlation measures to group features. And, this correlation knowledge is embedded into the encoding method and the search process. A niching-based mutation operator is also used to explore the vicinity of a target individual. The aim is to find different feature subsets with very similar or the same classification performance. In addition, a modification operator is proposed aiming to increase the population diversity to improve the feature selection performance. The experiments on 16 datasets show that the proposed algorithm outperforms other popular feature selection methods in terms of classification accuracy and feature subset size.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call