Abstract
Spectral clustering (SC) has attracted more and more attention due to its effectiveness in machine learning. However, most traditional spectral clustering methods still face challenges in the successful application of large-scale spectral clustering problems mainly due to their high computational complexity οn3, where n is the number of samples. In order to achieve fast spectral clustering, we propose a novel approach, called representative point-based spectral clustering (RPSC), to efficiently deal with the large-scale spectral clustering problem. The proposed method first generates two-layer representative points successively by BKHK (balanced k-means-based hierarchical k-means). Then it constructs the hierarchical bipartite graph and performs spectral analysis on the graph. Specifically, we construct the similarity matrix using the parameter-free neighbor assignment method, which avoids the need to tune the extra parameters. Furthermore, we perform the coclustering on the final similarity matrix. The coclustering mechanism takes advantage of the cooccurring cluster structure among the representative points and the original data to strengthen the clustering performance. As a result, the computational complexity can be significantly reduced and the clustering accuracy can be improved. Extensive experiments on several large-scale data sets show the effectiveness, efficiency, and stability of the proposed method.
Highlights
Clustering is one of the fundamental topics in unsupervised learning
Zhao et al proposed a spectral clustering based on iterative optimization (SCIO), which solves the spectral decomposition problem of largescale and high-dimensional data set, and this method performs on multitask clustering [19]. e nonnegative matrix factorization (NMF) has been proposed as the relaxation technique for clustering with excellent performance [20, 21]
We proposed a novel representative point-based spectral clustering approach, named RPSC, based on the twolayer bipartite graph
Summary
Clustering is one of the fundamental topics in unsupervised learning. It has been widely and successfully applied in data mining, pattern recognition, and many other fields. E traditional spectral clustering needs two independent steps: constructing similarity graph and performing spectral analysis [12] Both the steps are computational expensive for large-scale data, and their computational complexity is o(n2) and o(n3), respectively. Zhao et al proposed a spectral clustering based on iterative optimization (SCIO), which solves the spectral decomposition problem of largescale and high-dimensional data set, and this method performs on multitask clustering [19]. Liu et al [28] proposed an efficient cluster algorithm for large-scale graph data using spectral methods. Ese methods mentioned above adopt representative point-based strategy to construct the similarity graph to accelerate the procedure of spectral clustering. A novel and efficient representative point-based spectral clustering method is proposed to deal with large-scale data sets. We can obtain the first-layer representative points by performing above process iteratively. en the procedure is repeated on the first-layer representative points to generate the second-layer representative points
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have