Abstract

Conventional Document clustering techniques aim to group the documents into different semantic classes based on the cluster hypothesis. Most of the existing techniques are based on either single term keyword with its frequency analysis or phrase based approach using n-gram techniques of the document.  Accurate clustering is infeasible in document clustering because of the curse of dimensionality due to the high dimensionality space of it. For the successful clustering of text documents, a two step process is proposed in this paper. This proposed method involves with concept based indexing with the domain ontology as background knowledge for concept extraction and clustering of documents. The results of the proposed method is compared with the traditional indexing technique, Latent Semantic Indexing (LSI). In order to prove the efficiency of the proposed technique, biomedical domain is chosen with MeSH ontology. The experimental results show that the proposed method outperforms traditional term-base method and LSI.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call