Clustering Text Documents Using Kernel Possibilistic C-Means

M B Revanasiddappa,S V Aruna Kumar,B S Harish

doi:10.1007/978-981-10-5146-3_13

Abstract

Text Document Clustering is one of the classic topics in text mining, which groups text documents in unsupervised way. There are various clustering techniques available to cluster text documents. Fuzzy C-Means (FCM) is one of the popular fuzzy-clustering algorithm. Unfortunately, Fuzzy C-Means algorithm is too sensitive to noise. Possibilistic C-Means overcomes this drawback by releasing the probabilistic constraint of the membership function. In this paper, we proposed a Kernel Possibilistic C-Means (KPCM) method for Text Document Clustering. Unlike the classical Possibilistic C-Means algorithm, the proposed method employs the kernel distance metric to calculate the distance between the cluster center and text document. We used standard 20NewsGroups dataset for experimentation and conducted comparison between proposed method (KPCM), Fuzzy C-Means, Kernel Fuzzy C-Means and Possibilistic C-Means. The experimental results reveal that the Kernel Possibilistic C-Means outperforms the other methods in terms of accuracy.

Full Text