Abstract

With a small scale of prior information, including pairwise constraints and partial labels, semi-supervised clustering algorithms can improve clustering performance by imposing certain restrictions on the clustering results, and have accordingly attracted a lot of attention in recent years. Moreover, a large number of graph-based semi-supervised clustering algorithms have been proposed, since graphs are naturally suited to pairwise constraints. However, most graph-based approaches involve a two-step process and are not scalable to large-scale datasets due to their high time complexity. Additionally, they also encounter a serious problem of constraint violation. To deal with this issue, we propose a new non-parametric model to directly obtain the indicator matrix by minimizing the cut of the graph with a regularization term that corresponds to the balance of the clusters. To simultaneously support the two forms of prior information, our approach restricts the indicator matrix during optimization, while also ensuring that the clusters obtained by our approach do not violate any constraints. The computational complexity of the proposed approach is nearly linear to the data size n. Experiments show that the algorithm proposed in this paper achieves significant improvements in both clustering performance and efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call