Abstract

Document Clustering is an unsupervised method for classified documents in clusters on the basis of their similarity. Any document get it place in any specific cluster, on the basis of membership score, which calculated through membership function. But many of the traditional clustering algorithms are generally based on only BOW (Bag of Words), which ignores the semantic similarity between document and Cluster. In this research we consider the semantic association between cluster and text document during the calculation of membership score of any document for any specific cluster. Several researchers are working on semantic aspects of document clustering to develop clustering performance. Many external knowledge bases like WordNet, Wikipedia, Lucene etc. are utilized for this purpose. The proposed approach exploits WordNet to improve cluster member ship function. The experimental result shows that clustering quality improved significantly by using proposed framework of semantic approach.

Highlights

  • A days to solve any query, search engine is very useful and instant tool

  • Many of the traditional clustering algorithms are mostly based on only BOW, which ignores the semantic similarity between document and Cluster

  • Text documents are normally full of abstract concepts, which difficult to represent by using traditional methodology of text mining

Read more

Summary

Introduction

A days to solve any query, search engine is very useful and instant tool. Internet is fastest method to learn, understand and solve any problem or get any information from worldwide knowledge base. All search engines are using document clustering to display query results in organized and in effective manner. Many of the traditional clustering algorithms are mostly based on only BOW, which ignores the semantic similarity between document and Cluster. Due to lacking of this, traditional document clustering algorithms are not capable to present semantic associations among the words and penalties in less qualitative output. Use of external knowledge base is being very helpful to develop semantic based approaches for document clustering. The use of WordNet in clustering captures the relations between the words and help to identify the precise cluster of the documents.

Related works
WordNet
Experimental evaluation
Conclusion and future work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.