Learning Kernels for Semi-Supervised Clustering

Bojun Yan

doi:10.4018/978-1-60566-010-3.ch177

Abstract

As a recent emerging technique, semi-supervised clustering has attracted significant research interest. Compared to traditional clustering algorithms, which only use unlabeled data, semi-supervised clustering employs both unlabeled and supervised data to obtain a partitioning that conforms more closely to the user’s preferences. Several recent papers have discussed this problem (Cohn, Caruana, & McCallum, 2003; Bar- Hillel, Hertz, Shental, & Weinshall, 2003; Xing, Ng, Jordan, & Russell, 2003; Basu, Bilenko, & Mooney, 2004; Kulis, Dhillon, & Mooney, 2005). In semi-supervised clustering, limited supervision is provided as input. The supervision can have the form of labeled data or pairwise constraints. In many applications it is natural to assume that pairwise constraints are available (Bar-Hillel, Hertz, Shental, & Weinshall, 2003; Wagstaff, Cardie, Rogers, & Schroedl, 2001). For example, in protein interaction and gene expression data (Segal, Wang, & Koller, 2003), pairwise constraints can be derived from the background domain knowledge. Similarly, in information and image retrieval, it is easy for the user to provide feedback concerning a qualitative measure of similarity or dissimilarity between pairs of objects. Thus, in these cases, although class labels may be unknown, a user can still specify whether pairs of points belong to the same cluster (Must-Link) or to different ones (Cannot-Link). Furthermore, a set of classified points implies an equivalent set of pairwise constraints, but not vice versa. Recently, a kernel method for semi-supervised clustering has been introduced (Kulis, Dhillon, & Mooney, 2005). This technique extends semi-supervised clustering to a kernel space, thus enabling the discovery of clusters with non-linear boundaries in input space. While a powerful technique, the applicability of a kernel-based semi-supervised clustering approach is limited in practice, due to the critical settings of kernel’s parameters. In fact, the chosen parameter values can largely affect the quality of the results. While solutions have been proposed in supervised learning to estimate the optimal kernel’s parameters, the problem presents open challenges when no labeled data are provided, and all we have available is a set of pairwise constraints.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning Kernels for Semi-Supervised Clustering

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Semi-supervised graph clustering: a kernel approach
Brian Kulis ... Sugato Basu
Machine Learning | VOL. 74
Brian Kulis, et. al.Brian Kulis ... Sugato Basu
24 Sep 2008
Machine Learning | VOL. 74

Semi-supervised graph clustering
Brian Kulis ... Raymond Mooney
-
Brian Kulis, et. al.Brian Kulis ... Raymond Mooney
01 Jan 2004
01 Jan 2004

A review on semi-supervised clustering
Jianghui Cai ... Yuqing Yang
Information Sciences | VOL. 632
Jianghui Cai, et. al.Jianghui Cai ... Yuqing Yang
05 Mar 2023
Information Sciences | VOL. 632

Semi-supervised Fuzzy c-Means Clustering Using Clusterwise Tolerance Based Pairwise Constraints
Yukihiro Hamasuna ... Sadaaki Miyamoto
-
Yukihiro Hamasuna, et. al.Yukihiro Hamasuna ... Sadaaki Miyamoto
01 Aug 2010
01 Aug 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning Kernels for Semi-Supervised Clustering

Abstract

Talk to us

Similar Papers