Abstract
Conventional Document clustering techniques aim to group the documents into different semantic classes based on the cluster hypothesis. Most of the existing techniques are based on either single term keyword with its frequency analysis or phrase based approach using n-gram techniques of the document. Accurate clustering is infeasible in document clustering because of the curse of dimensionality due to the high dimensionality space of it. For the successful clustering of text documents, a two step process is proposed in this paper. This proposed method involves with concept based indexing with the domain ontology as background knowledge for concept extraction and clustering of documents. The results of the proposed method is compared with the traditional indexing technique, Latent Semantic Indexing (LSI). In order to prove the efficiency of the proposed technique, biomedical domain is chosen with MeSH ontology. The experimental results show that the proposed method outperforms traditional term-base method and LSI.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.