Abstract

Inspired by the success of a recently developed algorithm MVSC-IR, the authors embed the idea of Multi-Viewpoint Based Similarity Measure for clustering (MVSC) into a hierarchical clustering method, i.e., average linkage clustering, to overcome the problem of initiation with random seeds, resulting in a new algorithm, referred to as MVSC-HAC. The improved performance of this new algorithm encouraged us to further explore the impact of metadata in document clustering. In this paper, after reviewing two existing algorithms, the authors describe their new algorithm and present experimental results on various sizes of data sets at two different levels: the one using the entire context of documents and the one using existing meta tags of the documents. The result shows MVSC-HAC excels at both levels. The authors analyze the results, and provide a discussion based on other observation on the role of metadata in document clustering.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call