Abstract

Effective data similarity measures are essential in data clustering. This paper proposes a novel deep metric clustering method with simultaneous non-linear similarity learning and clustering. Unlike pre-defined similarity measures, this deep metric enables more effective data clustering on high-dimensional data with various non-linear similarities. In the proposed method, a similarity function is firstly approximated by a deep metric network. The graph Laplacian matrix is introduced to make data cluster assignments. A stochastic optimization is then proposed to efficiently construct the optimal deep metric network and calculate data similarity and cluster assignment on large-scale data set. For N data samples, the proposed optimization effectively reduces the computation of N2 pairs of data to \(M^2 (M\!\!\ll \!\!N)\) pairs at each step of the approximation. A co-training method is further introduced to optimize the deep metric network on a portion of semi-supervised data for clustering with targeted purposes. Finally, this paper shows theoretical connections between the proposed method and spectral clustering in subspace learning. This method is able to achieve ∼20% higher accuracies than the best existing multi-view and subspace clustering methods on Caltech and MSRCV1 object recognition data sets. Further results on benchmark and real-world visual data show competitive performance of the proposed method over deep subspace clustering network and many related and state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call