CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multiple Languages

Tommaso Pasini,Federico Scozzafava,Bianca Scarlini

doi:10.18653/v1/2020.acl-main.369

Abstract

Knowing the Most Frequent Sense (MFS) of a word has been proved to help Word Sense Disambiguation (WSD) models significantly. However, the scarcity of sense-annotated data makes it difficult to induce a reliable and high-coverage distribution of the meanings in a language vocabulary. To address this issue, in this paper we present CluBERT, an automatic and multilingual approach for inducing the distributions of word senses from a corpus of raw sentences. Our experiments show that CluBERT learns distributions over English senses that are of higher quality than those extracted by alternative approaches. When used to induce the MFS of a lemma, CluBERT attains state-of-the-art results on the English Word Sense Disambiguation tasks and helps to improve the disambiguation performance of two off-the-shelf WSD models. Moreover, our distributions also prove to be effective in other languages, beating all their alternatives for computing the MFS on the multilingual WSD tasks. We release our sense distributions in five different languages at https://github.com/SapienzaNLP/clubert.

Highlights

Word Sense Disambiguation (WSD) is the task of associating a word in context with a meaning from a given inventory of senses (Navigli, 2009)
We investigate the capabilities of CluBERT to scale over different languages by evaluating it on the multilingual Word Sense Disambiguation tasks of SemEval-2013* and SemEval-2015*
We assess CluBERT Most Frequent Sense (MFS) effectiveness when used as backoff strategy in two off-the-shelf WSD approaches, i.e., UKB and the BiLSTM with attention model presented by Raganato et al (2017b)

Summary

Introduction

Word Sense Disambiguation (WSD) is the task of associating a word in context with a meaning from a given inventory of senses (Navigli, 2009). Current approaches to WSD can mainly be divided into supervised and knowledge-based methods While the former leverage manually-annotated data to train statistical models, the latter exploit the knowledge enclosed within a semantic network to identify the most appropriate meaning of a word in context. Since words and senses follow a Zipfian distribution (McCarthy et al, 2004a), information on rare words and meanings is scarce in both semantically-annotated data and knowledge bases This undermines the ability of supervised and knowledge-based approaches to deal with words unseen at training time, or that have only a few connections within a semantic network. The WordNet most frequent sense for the noun pipe is its smoking device meaning, nowadays, one would expect the metal pipe sense to appear more often in general

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multiple Languages

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 44	License type: cc-by

Similar Papers

Automatic Labelling of Genre-Specific Collections for Word Sense Disambiguation in Russian
Angelina Bolshina ... Natalia Loukachevitch
-
Angelina Bolshina, et. al.Angelina Bolshina ... Natalia Loukachevitch
01 Jan 2020
01 Jan 2020

Word sense disambiguation for statistical machine translation
Marine Jacinthe Carpuat
-
Marine Jacinthe CarpuatMarine Jacinthe Carpuat
23 Dec 2014
23 Dec 2014

FEWS: Large-Scale, Low-Shot Word Sense Disambiguation with the Dictionary
Terra Blevins ... Mandar Joshi
-
Terra Blevins, et. al.Terra Blevins ... Mandar Joshi
01 Jan 2020
01 Jan 2020

Attention-based Stacked Bidirectional Long Short-term Memory Model for Word Sense Disambiguation
Yujia Sun ... Jan Platoš
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. -
Yujia Sun, et. al.Yujia Sun ... Jan Platoš
18 May 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multiple Languages

Abstract

Highlights

Summary

Talk to us

Similar Papers