The success of existing unsupervised feature selection (UFS) methods heavily relies on the assumption that the intrinsic relationships among original high-dimensional (HD) data samples exist in the discriminative low-dimension (LD) subspace. However, previous UFS methods commonly construct pairwise graphs and employ l2,1 -norm regularization to severally preserve the local structure and calculate the score of features, which is computationally complex and easy to get stuck into local optimum, so that those approaches cannot be applied in dealing with large-scale datasets in practice. To overcome this challenge, we propose a novel UFS method, in which a novel anchor graph embedding paradigm is designed to extract the local neighborhood relationships among data samples by reducing the computational complexity of graph construction to be linear in the number of data. Moreover, to improve the optimality of selected features as well as the performance of downstream tasks, we propose a discrete feature scoring mechanism, which imposes orthogonal l2,0 -norm constraints on learned projections, in order to enhance the distinction of feature scores as well as reduce the probability of falling into local optimum. In addition, solving the proposed nonconvex and nonsmooth NP-hard problem is challenging, and we present an efficient optimization algorithm to address it and acquire a closed-form solution of the transformation matrix. Extensive experiments demonstrate the effectiveness and efficiency of the proposed UFS by comparison with several state-of-the-art approaches to clustering and image segmentation tasks.
Read full abstract