Abstract

Clustering is an important unsupervised data analysis technique, which divides data objects into clusters based on similarity. Clustering has been studied and applied in many different fields, including pattern recognition, data mining, decision science and statistics. Clustering algorithms can be mainly classified as hierarchical and partitional clustering approaches. Partitioning around medoids (PAM) is a partitional clustering algorithms, which is less sensitive to outliers, but greatly affected by the poor initialization of medoids. In this paper, we augment the randomized seeding technique to overcome problem of poor initialization of medoids in PAM algorithm. The proposed approach (PAM++) is compared with other partitional clustering algorithms, such as K-means and K-means++ on text document clustering benchmarks and evaluated in terms of F-measure. The results for experiments indicate that the randomized seeding can improve the performance of PAM algorithm on text document clustering.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call