Abstract
Document clustering is the application of cluster analysis to textual documents. It is commonly used technique in data mining, information retrieval, knowledge discovery from data, pattern recognition, etc. In traditional document clustering, a document is considered as a bag of words; where semantic meaning of word is not taken into consideration. However, to achieve accurate document clustering, feature such as meanings of the words is important. Document clustering can be done using semantic approach because it takes semantic relationship among words into account. This paper highlights the problems in traditional approach as well as semantic approach. This paper identifies four major areas under semantic clustering and presents a survey of 23 papers that are studied, covering major significant work. Moreover, this paper also provides a survey of tools specifically used for text processing, and clustering algorithms, that help in applying and evaluating document clustering. The presented survey is used in preparing the proposed work in the same direction. This proposed work uses the sense of a word for text clustering system. Lexical chains will be used as features that are to be developed using the identity/synonymy relation from WordNet ontology as background knowledge. Later, clustering will be done using the lexical chains.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.