Abstract

Clustering ensemble has emerged as an important tool for data analysis, by which a more robust and accurate consensus clustering can be generated. On forming the ensembles, empirical studies have suggested that better ensembles can be obtained by simultaneously considering the quality of the ensembles and the diversity among ensemble members. However, little research efforts have been paid to incorporate prior background knowledge. In this paper, we first provide a theoretical analysis on the effect of the diversity and quality of the ensemble members. We then propose a unified framework to solve constraint-based clustering ensemble selection problem, where some instance level must-link and cannot-link constraints are given as prior knowledge or background information. We formalize this problem as a combinatorial optimization problem in terms of the consistency under the constraints, the diversity among ensemble members, and the overall quality of ensembles. Our proposed framework brings together two distinct yet interrelated themes from clustering: ensemble clustering and semi-supervised clustering. We study different techniques for searching high-quality solutions. Experiments on benchmark datasets demonstrate the effectiveness of our framework.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call