Abstract

<abstract><p>The $ k $-means algorithm aims at minimizing the variance within clusters without considering the balance of cluster sizes. Balanced $ k $-means defines the partition as a pairing problem that enforces the cluster sizes to be strictly balanced, but the resulting algorithm is impractically slow $ \mathcal{O}(n^3) $. Regularized $ k $-means addresses the problem using a regularization term including a balance parameter. It works reasonably well when the balance of the cluster sizes is a mandatory requirement but does not generalize well for soft balance requirements. In this paper, we revisit the $ k $-means algorithm as a two-objective optimization problem with two goals contradicting each other: to minimize the variance within clusters and to minimize the difference in cluster sizes. The proposed algorithm implements a balance-driven variant of $ k $-means which initially only focuses on minimizing the variance but adds more weight to the balance constraint in each iteration. The resulting balance degree is not determined by a control parameter that has to be tuned, but by the point of termination which can be precisely specified by a balance criterion.</p></abstract>

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call