Knowledge-based and knowledge-lean methods combined in unsupervised word sense disambiguation

Antonio Jimeno Yepes,Alan R Aronson

doi:10.1145/2110363.2110449

Abstract

Word sense disambiguation (WSD) is an intermediate task within information retrieval and information extraction, attempting to select the proper sense of ambiguous words. For instance, the word cold could either refer to low temperature or viral infection.Due to the scarcity of training data, knowledge-based and knowledge-lean methods receive attention as disambiguation methods. Knowledge-based methods compare the context of the ambiguous word to the information available in a terminological resource, but their main purpose is not word sense disambiguation. Knowledge-lean unsupervised methods rely on term distributions instead of a resource enumerating the possible senses but might be inappropriate when there is a requirement to commit to a terminological resource as a catalog for candidate senses.We present preliminary results of the combination of knowledge-based and knowledge-lean unsupervised methods which improves the performance of knowledge-based methods between 3% and 8%. The evaluation is done on a new word sense disambiguation set which is available to the community.

Full Text