Abstract

Unsupervised feature selection has been an important technique in high-dimensional data analysis. Despite significant success, most of the existing unsupervised feature selection methods tend to estimate the underlying structure of data in the original feature space, but lack the ability to explore various subspaces in the high-dimensional space. In this paper, we argue that the use of a large number of random subspaces can significantly benefit the unsupervised feature selection accuracy. Particularly, we propose a new unsupervised feature selection approach based on multi-subspace randomization and collaboration. A balanced subspace randomization scheme is first presented to produce multiple basic feature partitions with similar-sized random subspaces. Then, multiple K-nearest neighbor graphs are constructed in these subspaces, based on which the Laplacian scores of the features in each subspace are obtained w.r.t. their locality preserving power. Thereafter, the obtained feature score vectors of different subspaces in different basic feature partitions are integrated into a full score vector of all features, which takes into account the structure information of various subspaces and can significantly enhance the performance of unsupervised feature selection. Experiments conducted on twenty high-dimensional datasets have demonstrated the high efficiency and robustness of our approach. The MATLAB source code is available at https://www.researchgate.net/publication/334520672.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call