Abstract
Due to the scarcity of data labels, unsupervised feature selection has received a lot of attention in recent years. While many unsupervised feature selection methods are capable of selecting relevant features, they often fail to comprehensively consider the impact of both local and global information of the data on feature selection, nor can they effectively handle the complex nonlinear relationships commonly found in real-world data. As a result, suboptimal feature subsets are often selected. In this paper, inspired by the Uniform Manifold Approximation and Projection (UMAP) manifold learning technique and the nonlinear sparse learning method based on Feature-Wise Kernelized Lasso, we propose a novel unsupervised feature selection method called Multi-Cluster Unsupervised Nonlinear Feature Selection based on UMAP and block HSIC Lasso (MUNFS). MUNFS greatly improves the representation of high-dimensional data during dimensionality reduction and effectively handles complex nonlinear relationships in such data. Specifically, by capturing the intrinsic topology of the data, MUNFS accurately preserves the local structure of the data while keeping as much of the global structure as possible. Furthermore, the kernel-based Hilbert–Schmidt Independence Criterion (HSIC) may measure the nonlinear dependency between the features and the target variables, while applying the l1 regularization term in feature selection to achieve sparsity. This allows for a more precise assessment of the significance of each feature. Extensive experimental results on five benchmark datasets and eight hyperspectral datasets demonstrate that the MUNFS method performs much better than several other feature selection methods.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have