Semi-Supervised Clustering Algorithm Based on Small Size of Labeled Data

Ming Wei Leng,Long Jie Li,Jian Jun Cheng,Xiao Yun Chen

doi:10.4028/www.scientific.net/amm.121-126.4675

Ming Wei Leng, Long Jie Li + Show 2 more

https://doi.org/10.4028/www.scientific.net/amm.121-126.4675

Copy DOI

Abstract

In many data mining domains, labeled data is very expensive to generate, how to make the best use of labeled data to guide the process of unlabeled clustering is the core problem of semi-supervised clustering. Most of semi-supervised clustering algorithms require a certain amount of labeled data and need set the values of some parameters, different values maybe have different results. In view of this, a new algorithm, called semi-supervised clustering algorithm based on small size of labeled data, is presented, which can use the small size of labeled data to expand labeled dataset by labeling their k-nearest neighbors and only one parameter. We demonstrate our clustering algorithm with three UCI datasets, compared with SSDBSCAN[4] and KNN, the experimental results confirm that accuracy of our clustering algorithm is close to that of KNN classification algorithm.

Full Text