Abstract

The process of identifying the actual meanings of words in a given text fragment has a long history in the field of computational linguistics. Due to its importance in understanding the semantics of natural language, it is considered one of the most challenging problems facing this field. In this article we propose a new unsupervised similarity-based word sense disambiguation (WSD) algorithm that operates by computing the semantic similarity between glosses of the target word and a context vector. The sense of the target word is determined as that for which the similarity between gloss and context vector is greatest. Thus, whereas conventional unsupervised WSD methods are based on measuring pairwise similarity between words, our approach is based on measuring semantic similarity between sentences. This enables it to utilize a higher degree of semantic information, and is more consistent with the way that human beings disambiguate; that is, by considering the greater context in which the word appears. We also show how performance can be further improved by incorporating a preliminary step in which the relative importance of words within the original text fragment is estimated, thereby providing an ordering that can be used to determine the sequence in which words should be disambiguated. We provide empirical results that show that our method performs favorably against the state-of-the-art unsupervised word sense disambiguation methods, as evaluated on several benchmark datasets through different models of evaluation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.