Abstract

Data clustering is an important research topic in data mining and signal processing communications. In all the data clustering methods, the subspace spectral clustering methods based on self expression model, e.g., the Sparse Subspace Clustering (SSC) and the Low Rank Representation (LRR) methods, have attracted a lot of attention and shown good performance. The key step of SSC and LRR is to construct a proper affinity or similarity matrix of data for spectral clustering. Recently, Laplacian graph constraint was introduced into the basic SSC and LRR and obtained considerable improvement. However, the current graph construction methods do not well exploit and reveal the non-linear properties of the clustering data, which is common for high dimensional data. In this paper, we introduce the classic manifold learning method, the Local Linear Embedding (LLE), to learn the non-linear structure underlying the data and use the learned local geometry of manifold as a regularization for SSC and LRR, which results the proposed LLE-SSC and LLE-LRR clustering methods. Additionally, to solve the complex optimization problem involved in the proposed models, an efficient algorithm is also proposed. We test the proposed data clustering methods on several types of public databases. The experimental results show that our methods outperform typical subspace clustering methods with Laplacian graph constraint.

Highlights

  • Data clustering or segmentation is an active research topic in data mining, signal processing and unsupervised learning [1,2]

  • We proposed a new data clustering framework based on manifold learning and subspace clustering

  • Local Linear Embedding (LLE) is used to learn the manifold structure hidden in the data

Read more

Summary

Introduction

Data clustering or segmentation is an active research topic in data mining, signal processing and unsupervised learning [1,2]. In spite of obtaining improvement by utilizing the non-linear property hiding in the high dimensional data, most of the current methods justly represent this non-linear property by the Laplacian graph constructed directly from the similarity of the raw data under Euclidean distance, which is hard to fulfill modeling the intrinsic manifold structure. These methods are biased by the noise and outliers existing in the raw data.

Related Works
The Proposed LLE-SSC and LLE-LRR Models
Solutions to LLE-SSC and LLE-LRR
1: Initialization
Experiments
Methods for Constructing the Laplacian Matrix
Synthetic Experiment
Handwritten Digit Clustering on USPS
Motion Segmentation on Hopkins155
Object Image Clustering on COIL20
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call