Abstract

Ensemble feature selection has attracted much attention of scholars because of its good robustness. However, existing methods on dealing with high-dimensional or large-scale data still have some shortcomings, such as high computational cost or high feature redundancy. In view of it, this paper proposes a new ensemble feature selection algorithm for large-scale data, called Multi-surrogate-assisted Dual-layer Ensemble Feature Selection (MDEFS). In MDEFS, a filter ensemble feature selection method with fast search ability is firstly developed to remove irrelevant or weakly-relevant features in the first layer. In the second layer, a particle swarm-based ensemble method with global search ability is proposed to select optimal feature subset from those remaining features. Furthermore, a multi-surrogate-assisted search mechanism of swarm is developed to reduce the cost of MDEFS on processing large-scale data, where the whole original dataset is replaced by multiple types of representative samples. Finally, the proposed algorithm is applied to 13 datasets and compared with 7 feature selection algorithms. Experimental results show that the multi-surrogate-assisted search mechanism can obviously reduce the running time of MDEFS, and the proposed ensemble approach can make MDEFS obtain better feature subsets, whose average accuracy is 0.72% higher than the best comparison algorithm on each data set. All results indicate that MDEFS is a robust and competitive feature selection algorithm.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call