Abstract

Semi-supervised cluster ensemble usually introduces a small amount of supervision in the first stage of cluster ensemble, i.e., ensemble generation, by performing many runs of semi-supervised clustering algorithms. However, it is neither efficient in terms of computational complexity, nor flexible in a dynamic learning environment where limited supervision changes over time. In this article we propose a new framework which generates base partitions in an unsupervised manner and attributes different weights to each cluster of the base partitions. The weighting scheme considers both the internal validation measures of clustering and the degrees of satisfaction of pairwise constraints. A weighted co-association matrix based consensus approach is then applied to achieve a final partition. To handle high-dimensional data, we generate base partitions using k-means with both random sampling and random subspace techniques. The new framework retains a high accuracy, and is efficient since it avoids performing semi-supervised clustering in ensemble generation and the complexity of the weighting scheme is independent of the number of instances in a dynamic environment. It is more adaptive than the traditional approach because it does not require rerunning semi-supervised clustering algorithms when the limited supervision changes. Empirical results on 12 datasets demonstrate that it is also more robust to noisy constraints.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call