Abstract

The discrimination of word senses, word sense disambiguation (WSD), is a major problem in natural language processing (NLP) applications, e.g., text classification and understanding. The problem of determining the correct sense of lexical items in raw texts is relevant to the activities of categorization, machine translation, information retrieval, and any language engineering task. Problems are related to the pervasive ambiguity of words and their use in texts. Moreover, the specificity of senses in the knowledge domains, where words are used, tends to augment the complexity of the disambiguation task, affecting the completeness of most on-line sources, like dictionaries and general purpose lexical resources. In this article an integrated method based on a well-known lexical knowledge base (i.e., WordNet) and on corpus statistics is used to tune the sense classification to a specific sublanguage and to drive contextual disambiguation of word senses. The method results in a system (General Purpose Ontology Disambiguation and Tuning, GODoT) aiming to support a semantic boot-strapping process within specific application domains. The approach has been extensively tested on verb classification in two different corpora, although it can be applied to other syntactic categories as well. The resulting disambiguation framework is intended to serve several NLP tasks like lexical acquisition (in the definition of class-based language models) or information retrieval (in the characterization of indexes by means of their senses in contexts).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call