Abstract
Clustering, which is a commonly used tool, has been applied in machine learning, data mining and so on, and has received extensive research. However, there are usually noise and outliers in the data, which will bring about significant errors in the clustering results. In this paper, a robust clustering model with adaptive graph regularization (RCAG) is proposed, on which, sparse error matrix is introduced to express sparse noise, such as impulse noise, dead line, stripes, and $\ell _{1}$ norm is introduced to alleviate the sparse noise. In addition, the $\ell _{2,1}$ norm is also proposed mitigating the effects of outliers, and it has rotation invariance property. Therefore, our RCAG is insensitive to data noise and outliers. More importantly, the adaptive graph regularization is introduced into the RCAG to improve the clustering performance. Aiming at the optimization objective, we propose an iterative updating algorithm, named the Augmented Lagrangian Method (ALM), to update each optimization variable respectively. The convergence and time complexity of RCAG is also proved in theory. Finally, experimental results on fourteen datasets of four application scenarios, such as face image, handwriting recognition and UCI, elaborate the superiority of proposed method over seven existing classical clustering methods. The experimental results demonstrate that our approach achieves better clustering performance in ACC and Purity, which is a little less impressive in other ways.
Highlights
Clustering is the process of dividing the object set into multiple classes composed of similar objects
EXPERIMENTS we evaluate the clustering quality of RCAG over fourteen datasets of four types of datasets, including ACC, normalized mutual information (NMI), and Purity
WORK In this paper, a low–rank matrix factorization model with noise and outliers based on adaptive graph regularization is proposed
Summary
Clustering is the process of dividing the object set into multiple classes composed of similar objects. Reference [23] proposed sparse dual graph-regularized nonnegative matrix factorization, and revealed the inherent geometric structure and distinguishing structure of data space and feature space. In order to improve the performance of NMF, many variants with various regularization have been proposed and various methods are proposed to solve the noise and outliers [26]. Huber loss was proposed to handle non-Gaussian noise and outliers, sparse terms and regularization terms were introduced to enhance the sparsity of the matrix and capture the data manifold structure [27]. Clustering based NMF still exist the following problems: (1) The traditional matrix factorization clustering method is easy to be dominated by noise and outliers to produce large errors. To address the problems mentioned above, we propose an adaptive graph regularization clustering (RCAG).
Published Version (
Free)
Join us for a 30 min session where you can share your feedback and ask us any queries you have