Abstract

Spectral clustering makes use of the spectrum of an input affinity matrix to segment data into disjoint clusters. The performance of spectral clustering depends heavily on the quality of the affinity matrix. Commonly used affinity matrices are constructed by either the Gaussian kernel or the self-expressive model with sparse or low-rank constraints. A technique called diffusion which acts as a post-process has recently shown to improve the quality of the affinity matrix significantly, by taking advantage of the contextual information. In this paper, we propose a variant of the diffusion process, named Self-Supervised Diffusion, which incorporates clustering result as feedback to provide supervisory signals for the diffusion process. The proposed method contains two stages, namely affinity learning with diffusion and spectral clustering. It works in an iterative fashion, where in each iteration the clustering result is utilized to calculate a pseudo-label similarity so that it can aid the affinity learning stage in the next iteration. Extensive experiments on both synthetic and real-world data have demonstrated that the proposed method can learn accurate and robust affinity, and thus achieves superior clustering performance.

Highlights

  • Clustering refers to the problem of dividing a set of data points into several clusters so that data points in the same cluster are more similar than data points in different clusters

  • EXPERIMENTS We perform the same spectral clustering (Algorithm 1) with affinity matrices obtained by different affinity learning methods, and use clustering performance to measure the quality of affinity matrix

  • In this paper, we proposed a novel affinity learning method, namely Self-Supervised Diffusion (SSD), which is based on a diffusion process that makes use of label feedback

Read more

Summary

INTRODUCTION

Clustering refers to the problem of dividing a set of data points into several clusters so that data points in the same cluster are more similar than data points in different clusters. By applying sparse constraints on the coefficients, it encourages data points from the same cluster to be selected for reconstruction, resulting in nonzero coefficients between these points which can be treated as affinities These affinity learning methods focus on deriving pairwise relationships in the feature space, but the contextual relationships of neighboring data pairs are not fully exploited. This idea is somewhat similar: using a different σ for each data point, e.g., σi equals to the mean distance of xi to its k-nearest neighbor Another commonly used affinity matrix is constructed by the self-expressive model, in which the essential idea is to represent a data point as a linear combination of all other data points. There are some other affinity matrix construction methods, such as the decision tree based approach [19], fuzzy logic based approach [20], probabilistic geodesic distance based approach [21], nonnegative matrix factorization based approach [22], affinity and penalty constrained approach [23], one-step approach that combines affinity learning and subspace learning [24], [25], auto-encoder based approach [26], etc

DIFFUSION PROCESS
SOLVING A WHEN Y IS KNOWN
EXPERIMENTS
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call