Semi-supervised Word Sense Disambiguation Using the Web as Corpus

Rafael Guzmán-Cabrera,David Pinto-Avendaño,Manuel Montes-Y-Gómez,Luis Villaseñor-Pineda,Paolo Rosso

doi:10.1007/978-3-642-00382-0_21

Abstract

As any other classification task, Word Sense Disambiguation requires a large number of training examples. These examples, which are easily obtained for most of the tasks, are particularly difficult to obtain for this case. Based on this fact, in this paper we investigate the possibility of using a Web-based approach for determining the correct sense of an ambiguous word based only in its surrounding context. In particular, we propose a semi-supervised method that is specially suited to work with just a few training examples. The method considers the automatic extraction of unlabeled examples from the Web and their iterative integration into the training data set. The experimental results, obtained over a subset of ten nouns from the SemEval lexical sample task, are encouraging. They showed that it is possible to improve the baseline accuracy of classifiers such as Naive Bayes and SVM using some unlabeled examples extracted from the Web.

Full Text