Practical Word-Sense Disambiguation Using Co-occurring Concept Codes

Youjin Chung,Jong-Hyeok Lee

doi:10.1007/s10590-005-2559-y

Abstract

Most previous corpus-based approaches to the resolution of word-sense ambiguity have collected lexical information from the context of the word to be disambiguated. However, they suffer from the problem of data sparseness. To address this problem, this paper proposes a disambiguation method using co-occurring concept codes (CCCs). The use of concept-code features and concept-code generalization effectively alleviate the data sparseness problem and also reduce the number of features to a practical size without any loss in system performance. We prove the effectiveness of the CCC features and the concept-code generalization by experimental evaluations. The proposed disambiguation method is applied to a Korean-to-Japanese MT system that experimented with various machine-learning techniques. In a lexical sample evaluation, our CCC-based method achieved a precision of 82.00%, with an 11.83% improvement over the baseline. Also, it achieved a precision of 83.51% in an experiment on real text, which shows that our proposed method is very useful for practical MT systems.

Full Text