Constrains optimal propagation-based modified semi-supervised spectral clustering for large-scale data

Hailin Feng,Xuyao Zhang,Dayu Xu,Jiaqi Huang

doi:10.1504/ijads.2018.10010706

Abstract

We focus on the problem of high computational complexity in the clustering process of traditional spectral clustering algorithm that cannot satisfy the requirement of current large-scale data clustering applications. In this article, we establish a constrained optimal propagation based semi-supervised large-scale data clustering model. In this model, micro similarity matrix is constructed by using prior dotted pair constraint information at first. On this basis, the Gabow algorithm is exploited to extract each strongly connected component from the micro similarity matrix that is represented by its connected graph. Then, a new constrained optimisation propagation algorithm for each strongly connected component is proposed to calculate the similarity of the whole dataset. Finally, we employ the singular value decomposition and the accelerated k-means algorithm to obtain the clustering results of large-scale data. Experiments on multiple standard testing datasets show that compared with other previous research results in this field, the proposed clustering model has higher clustering accuracy and lower computation complexity, and is more suitable for large-scale data clustering applications.

Full Text