Abstract

Unsupervised feature selection has become an important and challenging pre-processing step in machine learning and data mining since large amount of unlabelled high dimensional data are often required to be processed. In this paper, we propose an efficient method for robust unsupervised feature selection via dual self-representation and manifold regularization, referred to as DSRMR briefly. On the one hand, a feature self-representation term is used to learn the feature representation coefficient matrix to measure the importance of different feature dimensions. On the other hand, a sample self-representation term is used to automatically learn the sample similarity graph to preserve the local geometrical structure of data which has been verified critical in unsupervised feature selection. By using l2,1-norm to regularize the feature representation residual matrix and representation coefficient matrix, our method is robustness to outliers, and the row sparsity of the feature coefficient matrix induced by l2,1-norm can effectively select representative features. During the optimization process, the feature coefficient matrix and sample similarity graph constrain each other to obtain optimal solution. Experimental results on ten real-world data sets demonstrate that the proposed method can effectively identify important features, outperforming many state-of-the-art unsupervised feature selection methods in terms of clustering accuracy (ACC) and normalized mutual information (NMI).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call