Abstract

For many specific applications in data mining and machine learning, we face explicit or latent size constraint for each cluster that leads to the "balanced clustering" problem. Many existing clustering algorithms perform well in partitioning but fail in producing balanced clusters and preserving the naturally balanced structure of some data. In this paper, we propose a novel balanced clustering framework that flexibly utilizes local and global information of data. First, we propose the global balanced clustering (GBC), in which a global discriminative partitioning model is combined with the minimization of the distribution entropy of data. Then, we show that the proposed GBC can be further used to globally regularize some widely used local clustering models, so as to transform them into balanced clustering that simultaneously capture local and global data. We apply our global balanced regularization to spectral clustering (SC) and local learning (LL)-based clustering, respectively, and propose another two novel balanced clustering models: the local and global balanced SC (LGB-SC) and LGB-LL. Finding the optimal balanced partition is nondeterministic polynomial-time (NP)-hard in general. We adopt the method of augmented Lagrange multipliers to help optimize our model. Comprehensive experiments on several real world benchmarks demonstrate the advantage of our framework to yield balanced clusters while preserving good clustering quality. Our proposed LGB-SC and LGB-LL also outperform SC and LL as well as other classical clustering methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call