Before conducting unsupervised feature selection, it is usually assumed that these data are independent of each other. On the contrary, real data will influence each other. Therefore, traditional feature selection methods may lose information related to each other between data. This can lead to inaccurately generated pseudo-label information and may result in poor feature selection results. To find solutions to this issue, this paper proposes robust feature selection via central point link information and sparse latent representation (CPSLR). Firstly, structure a link graph by calculating the center matrix to store the distance information from the sample to the center point. If two samples have similar distances to the center point, it can be determined that they belong to the same class. Therefore, the similarity between samples is preserved, and more accurate pseudo-label information is obtained. Secondly, CPSLR uses data graph and link graph to form a dual graph structure. It can not only retain the link information between samples but also retain the manifold structures of the samples. Then, CPSLR saves the interconnection contents between samples by sparse latent representation. That is, the constraint l2,1-norm is exerted on the expression of latent representation, and sparse non-redundant interconnection information is preserved. And by combining central point link information with sparse latent representation makes the interconnections between data reserved more comprehensive. That is to say, the pseudo-labels obtained are more like the real labels of the classes. Finally, CPSLR constrains the feature transformation matrix by l2,1/2-norm constraint so as to select robust and sparse features. CPSLR uses l2,1/2-norm constraint to assure that the feature transformation matrix is sparse, selecting more discriminative features, thereby obtaining the feature selection that can improve its efficiency. The experiments demonstrate that the clustering result of CPSLR outperform six classical or latest compared algorithms on eight datasets.
Read full abstract