Abstract

Text document clustering techniques automatically become an important research in which volume of text document via digital media is growing rapidly. This technique is known as document clustering. Document clustering is a method of grouping documents based on their similarity. For groupping these documents, one of the clustering algorithms is used, namely Active Fuzzy Constrained Clustering (AFCC), which combines fuzzy and semi-supervised clustering methods where text documents as a bag of words will be calculated with the value of meaningful words using the Vector Space Model. (VSM). The AFCC algorithm is identified by the use of pairwise constraint and centroid in its cluster. The input documents tested in the research are a collection of documents in the BBC News Archives. Based on the research that has been done, using the parameters of the maximum number of clusters, the maximum number of constraints per iteration and the maximum number of iterations, the AFCC algorithm results in grouping text documents that are news article. Performance measurement of clustering results in this research uses the Confusion Matrix approach, which can be generated with an average precision and recall value of 0.53, and an accuracy value of 0.52

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call