Weakly Supervised Word Sense Disambiguation Using Automatically Labelled Collections

Natalia Loukachevitch

doi:10.15514/ispras-2021-33(6)-13

Abstract

State-of-the-art supervised word sense disambiguation models require large sense-tagged training sets. However, many low-resource languages, including Russian, lack such a large amount of data. To cope with the knowledge acquisition bottleneck in Russian, we first utilized the method based on the concept of monosemous relatives to automatically generate a labelled training collection. We then introduce three weakly supervised models trained on this synthetic data. Our work builds upon the bootstrapping approach: relying on this seed of tagged instances, the ensemble of the classifiers is used to label samples from unannotated corpora. Along with this method, different techniques were exploited to augment the new training examples. We show the simple bootstrapping approach based on the ensemble of weakly supervised models can already produce an improvement over the initial word sense disambiguation models.

Highlights

The task of Word Sense Disambiguation (WSD) consists in identifying the correct sense of a polysemous word in the context
The labels obtained with the help of this approach were used to train three different weakly supervised WSD models: logistic regression with the deep representations from ELMo [1] language model as features, fine-tuned BERT [2] model and BERT model trained on context-gloss pairs
We describe an algorithm based on the weighted probabilistic ensemble of the WSD models used to predict sense labels and in Section 7 we demonstrate the results obtained by three different models

Summary

Introduction

The task of Word Sense Disambiguation (WSD) consists in identifying the correct sense of a polysemous word in the context. The recent advances in the field of WSD can be applied only to some languages because obtaining hand-crafted sense-labelled training collections is very expensive in terms of time and extensive human efforts. In recent years to address these challenges, practitioners turn to weak supervision that implies training models using data with imperfect labels, that can be obtained with some user-defined heuristics, external knowledge bases, other classifiers etc. In our research we utilize the method to automatically generate and label training collections with the help of monosemous relatives, that is a set of unambiguous words (or phrases) related to particular senses of a polysemous word. We propose an algorithm based on the ensemble of weakly supervised WSD models that can be used to label raw texts and, reduce human efforts to annotation.

Related work

Method of automatic labelling of training collections

Models

Experimental design

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Weakly Supervised Word Sense Disambiguation Using Automatically Labelled Collections

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Institute for System Programming of the RAS

Lead the way for us

Journal: Proceedings of the Institute for System Programming of the RAS	Publication Date: Jan 1, 2021
License type: cc-by

Similar Papers

Automatic Labelling of Genre-Specific Collections for Word Sense Disambiguation in Russian
Angelina Bolshina ... Natalia Loukachevitch
-
Angelina Bolshina, et. al.Angelina Bolshina ... Natalia Loukachevitch
01 Jan 2020
01 Jan 2020

FEWS: Large-Scale, Low-Shot Word Sense Disambiguation with the Dictionary
Terra Blevins ... Mandar Joshi
-
Terra Blevins, et. al.Terra Blevins ... Mandar Joshi
01 Jan 2020
01 Jan 2020

CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multiple Languages
Tommaso Pasini ... Federico Scozzafava
-
Tommaso Pasini, et. al.Tommaso Pasini ... Federico Scozzafava
01 Jan 2020
01 Jan 2020

Word sense disambiguation for statistical machine translation
Marine Jacinthe Carpuat
-
Marine Jacinthe CarpuatMarine Jacinthe Carpuat
23 Dec 2014
23 Dec 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Weakly Supervised Word Sense Disambiguation Using Automatically Labelled Collections

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Institute for System Programming of the RAS