On the Use of Automatically Acquired Examples for All-Nouns Word Sense Disambiguation

D Martinez,O Lopez De Lacalle,E Agirre

doi:10.1613/jair.2395

D Martinez, O Lopez De Lacalle + Show 1 more

Open Access

https://doi.org/10.1613/jair.2395

Copy DOI

Journal: Journal of Artificial Intelligence Research	Publication Date: Sep 25, 2008
Citations: 71	License type: publisher-specific license

Abstract

This article focuses on Word Sense Disambiguation (WSD), which is a Natural Language Processing task that is thought to be important for many Language Technology applications, such as Information Retrieval, Information Extraction, or Machine Translation. One of the main issues preventing the deployment of WSD technology is the lack of training examples for Machine Learning systems, also known as the Knowledge Acquisition Bottleneck. A method which has been shown to work for small samples of words is the automatic acquisition of examples. We have previously shown that one of the most promising example acquisition methods scales up and produces a freely available database of 150 million examples from Web snippets for all polysemous nouns in WordNet. This paper focuses on the issues that arise when using those examples, all alone or in addition to manually tagged examples, to train a supervised WSD system for all nouns. The extensive evaluation on both lexical-sample and all-words Senseval benchmarks shows that we are able to improve over commonly used baselines and to achieve top-rank performance. The good use of the prior distributions from the senses proved to be a crucial factor.

Full Text