Abstract

Extracting relevant documents from a larger document corpus is a challenging task. The process of clustering groups together the documents sharing similar topics. Incorporating semantic features will improve the accuracy of document clustering methods. Topic detection deals with discovering meaningful and concise labels for the clusters. In this paper, we propose a clustering algorithm named as correlation-based concept-oriented bisecting k-means algorithm using semantic-based similarity measure. This algorithm uses our existing modified semantic-based model in which related terms are extracted as concepts for concept-based document clustering and topic discovery method. The performance of the proposed work is compared with the existing term-based method and also with our earlier work on concept based algorithm. Additional experiments are conducted to demonstrate the ability of the proposed correlation-based concept-oriented bisecting k-means algorithm considering terms only, synonyms and hyponyms and correlated using F-measure and purity as evaluation metrics. Experimental results demonstrate the performance enhancement of the proposed algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.