Abstract

Supervised word sense disambiguation (WSD) models suffer from the knowledge acquisition bottleneck: the semantic annotation of large text collections is very time-consuming and requires much effort from experts. In this article we address the issue of the lack of sense-annotated data for the WSD task in Russian. We present an approach that is able to automatically generate text collections and annotate them with word senses. This method is based on the substitution and exploits monosemous relatives (related unambiguous entries) that can be located at relatively long distances from a target ambiguous word. Moreover, we present a similarity-based ranking procedure that enables to sort and filter monosemous relatives. Our experiments with WSD models, that rely on contextualized embeddings ELMo and BERT, have proven that our method can boost the overall performance. The proposed approach is knowledge-based and relies on the Russian thesaurus RuWordNet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call