Abstract

Document clustering is an integral and important part of text mining. There are two types of clustering, namely, hard clustering and soft clustering. In case of hard clustering, data item belongs to only one cluster whereas in soft clustering, data point may fall into more than one cluster. Thus, soft clustering leads to fuzzy clustering wherein each data point is associated with a membership function that expresses the degree to which individual data points belong to the cluster. Accuracy is desired in information retrieval, which can be achieved by fuzzy clustering. In the work presented here, a fuzzy approach for text classification is used to classify the documents into appropriate clusters using Fuzzy C Means (FCM) clustering algorithm. Enron email dataset is used for experimental purpose. Using FCM clustering algorithm, emails are classified into different clusters. The results obtained are compared with the output produced by k means clustering algorithm. The comparative study showed that the fuzzy clusters are more appropriate than hard clusters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call