Automatic acquisition of concepts from domain texts

J Punuru,Jianhua Chen Jianhua Chen

doi:10.1109/grc.2006.1635831

Abstract

Domain specific concept extraction is a key com- ponent in ontology construction for Semantic Web applications. Manual concept extraction is costly both in time and labor. In this paper, we present several heuristic methods for automatic concepts extraction from domain texts. These methods aim to improve the precision and recall over the word frequency-based techniques. Precision is improved by elimination of irrelevant terms using word sense information. Recall is enhanced by adding new concepts formed by composition of relevant words. Our methods are domain independent, and can be applied in fully automatic way to the concept extraction task. Experimental results on the electronic voting domain texts (from New York Times) are presented which show the promise of the proposed methods. Index Terms— Concept extraction, ontology engineering, text processing, WordNet, WordNet Senses.

Full Text