Unsupervised feature selection, one of the important dimensionality reduction methods for high-dimensional data, has been paid more and more attention by researchers due to its effectiveness of processing the data without labels. However, most unsupervised feature selection methods do not take into account the fact that real data tends to be mixed with noise or outliers, or only consider the sensitivity of the model to them. In addition, the local manifold structure of data is usually preserved by a simple graph based on fixed k nearest neighbors. On the one hand, the different sample distributions of different data are considered without distinction. On the other hand, simple graph can not well capture the inherent relations between samples when facing multi-modal data. In view of the above problems, a robust unsupervised feature selection algorithm, sparse and minimum-redundant subspace learning with dual regularization (SMRSDR), is proposed in this paper. Specifically, a self-paced learning regularization based on soft weight allocation is introduced into SMRSDR to strictly control the samples’ entry into model learning. Normal samples are allowed to enter model learning in order of their importance, while noise or outliers are strictly blocked out of model learning. Besides, considering the difference in sample distribution and the multimodal nature of data, the traditional simple graph Laplacian regularization is replaced by the hypergraph Laplacian regularization based on adaptive nearest neighbors selection. Furthermore, the sparsity and minimum-redundancy constraints are imposed on the subspace learning framework so that the most representative and minimum-redundant features can be selected. Finally, the objective function of SMRSDR is solved by an alternating iterative optimization algorithm. And a series of experiments are performed to comprehensively prove the effectiveness and superiority of SMRSDR.
Read full abstract