Semi-supervised clustering of large data sets with kernel methods

Stefan Faußer,Friedhelm Schwenker

doi:10.1016/j.patrec.2013.01.007

Abstract

Labelling real world data sets is a difficult problem. Often, the human expert is unsure about a class label of a specific sample point or, in case of very large data sets, it is impractical to label them manually. In semi-supervised clustering, the sample labels, which are external informations, are used to find better matching cluster partitions. Further, kernel-based clustering methods are able to partition the data with nonlinear boundaries in feature space. While these methods improve the clustering results, they have a quadratic computation time. In this paper, we propose a meta-algorithm that processes small-sized subsets of a large data set, clusters them with the sample labels and merges the points close to the resulting prototypes with the next points, until the whole data set has been processed. It has a linear computation time. The error function that this meta-algorithm minimizes is presented. Although we applied this meta-algorithm to Kernel Fuzzy C-Means, Relational Neural Gas and Kernel K-Means, it can be applied to a broad range of kernel-based clustering methods. The proposed method has been empirically evaluated on two real world benchmark data sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semi-supervised clustering of large data sets with kernel methods

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters

Lead the way for us

Journal: Pattern Recognition Letters	Publication Date: Jan 23, 2013
Citations: 19

Similar Papers

Semi-Supervised Kernel Clustering with Sample-to-Cluster Weights
Stefan Faußer ... Friedhelm Schwenker
-
Stefan Faußer, et. al.Stefan Faußer ... Friedhelm Schwenker
01 Jan 2012
01 Jan 2012

A Comparative Review of Incremental Clustering Methods for Large Dataset
-
International Journal of Advanced Trends in Computer Science and Engineering | VOL. 10
--
05 Apr 2021
International Journal of Advanced Trends in Computer Science and Engineering | VOL. 10

New diagonal bundle method for clustering problems in large data sets
Napsu Karmitsa ... Sona Taheri
European Journal of Operational Research | VOL. 263
Napsu Karmitsa, et. al.Napsu Karmitsa ... Sona Taheri
10 Jun 2017
European Journal of Operational Research | VOL. 263

Clustering in large data sets with the limited memory bundle method
Napsu Karmitsa ... Sona Taheri
Pattern Recognition | VOL. 83
Napsu Karmitsa, et. al.Napsu Karmitsa ... Sona Taheri
31 May 2018
Pattern Recognition | VOL. 83

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semi-supervised clustering of large data sets with kernel methods

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters