Abstract

Pairwise constraints known as must-link and cannot-link constraints have been frequently used in semi-supervised clustering. In this paper, we propose a novel usage of cannot-link constraints and develop a method called Mid-Perpendicular Hyperplane Similarity (MPHS) for semi-supervised clustering. Since a cannot-link constraint means that the two objects linked by it are not in the same class, there is a mid-perpendicular hyperplane to distinguish them. For each cannot-link constraint, we first compute the corresponding mid-perpendicular hyperplane and then use distances of objects to this hyperplane to learn a new data representation and similarity matrix. Finally, we combine all the similarity matrices from all cannot-link constraints into single similarity matrix and perform kernel k-means on it to obtain the partition. We implement MPHS for two cases, i.e., a simple one performed in original input space when the data set is nearly linear-separable, and an advanced one in kernel-induced feature space when the data set is complex and nonlinear-separable. Experimental results on several UCI data sets and some image data sets show the effectiveness of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call