This paper presents a system for automatic verb sense disambiguation using a small corpus and a Machine-Readable Dictionary (MRD) in Korean. The system learns a set of typical uses listed in the MRD usage examples for each of the senses of a polysemous verb in the MRD definitions using verb-object co-occurrences acquired from the corpus. This paper concentrates on the problem of data sparseness in two ways. First, by extending word similarity measures from direct co-occurrences to co-occurrences of co-occurring words, we compute the word similarities using non co-occurring words but co-occurring clusters. Secondly, we acquire IS-A relations of nouns from the MRD definitions. It is possible to roughly cluster the nouns by the identification of the IS-A relationship. Using these methods, two words may be considered similar even if they do not share any word elements. Experiments show that this method can learn from a very small training corpus, achieving over an 86% correct disambiguation performance without any restriction on a word's senses.
Read full abstract