A Binary Linear Programming-Based K-Means Algorithm For Clustering with Must-Link and Cannot-Link Constraints

Philipp Baumann

doi:10.1109/ieem45057.2020.9309775

Abstract

Clustering is probably the most extensively studied problem in unsupervised learning. Traditional clustering algorithms assign objects to clusters exclusively based on features of the objects. Constrained clustering is a generalization of traditional clustering where additional information about a dataset is given in the form of constraints. It has been shown that the clustering accuracy can be improved substantially by accounting for these constraints. We consider the constrained clustering problem where additional information is given in the form of must-link and cannot-link constraints for some pairs of objects. Various algorithms have been developed for this specific clustering problem. We propose a binary linear programming-based k-means approach that can consider must-link and cannot-link constraints. In a computational experiment, we compare the proposed algorithm to the DILS <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">CC</sub> algorithm, which represents the state-of-the-art. Our results on 75 problem instances indicate that the proposed algorithm delivers better clusterings than the DILS <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">CC</sub> algorithm in much shorter running time.

Full Text