Class discovery based on K-means clustering and perturbation analysis

Xiaohu Ru,Zheng Liu,Zhitao Huang,Wenli Jiang

doi:10.1109/cisp.2015.7408070

Abstract

Class discovery, which aims to identify the underlying category structure, is an important issue in pattern recognition and knowledge discovery. The key task in class discovery is to estimate the number of classes. Classical estimation approaches usually face the problems of low accuracy, high complexity, or difficulty in choosing an appropriate penalty function. In this paper, an effective class discovery method is proposed. The method first utilizes the characteristics of the mean-square-error produced by k-means clustering, giving a coarse estimate of the number of classes, and then calculates the difference between the clustering results obtained from the original dataset and the perturbed dataset to further determine the real number of classes. Experiments on simulated and real-world data demonstrate that the proposed method has satisfactory performance in different situations. Moreover, this method relies loosely on artificially selected parameters, thus can be reliably used in wide applications.

Full Text