ShotgunWSD 2.0: An Improved Algorithm for Global Word Sense Disambiguation

Andrei M Butnaru,Radu Tudor Ionescu

doi:10.1109/access.2019.2938058

Andrei M Butnaru, Radu Tudor Ionescu

Open Access

https://doi.org/10.1109/access.2019.2938058

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 11	License type: CC BY 4.0

Affiliation: University of Bucharest

Abstract

ShotgunWSD is a recent unsupervised and knowledge-based algorithm for global word sense disambiguation (WSD). The algorithm is inspired by the Shotgun sequencing technique, which is a broadly-used whole genome sequencing approach. ShotgunWSD performs WSD at the document level based on three phases. The first phase consists of applying a brute-force WSD algorithm on short context windows selected from the document, in order to generate a short list of likely sense configurations for each window. The second phase consists of assembling the local sense configurations into longer composite configurations by prefix and suffix matching. In the third phase, the resulting configurations are ranked by their length, and the sense of each word is chosen based on a majority voting scheme that considers only the top configurations in which the respective word appears. In this paper, we present an improved version (2.0) of ShotgunWSD which is based on a different approach for computing the relatedness score between two word senses, a step that stays at the core of building better local sense configurations. For each sense, we collect all the words from the corresponding WordNet synset, gloss and related synsets, into a sense bag. We embed the collected words from all the sense bags in the entire document into a vector space using a common word embedding framework. The word vectors are then clustered using k-means to form clusters of semantically related words. At this stage, we consider that clusters with fewer samples (with respect to a given threshold) represent outliers and we eliminate these clusters altogether. Words from the eliminated clusters are also removed from each and every sense bag. Finally, we compute the median of all the remaining word embeddings in a given sense bag to obtain a sense embedding for the corresponding word sense. We compare the improved ShotgunWSD algorithm (version 2.0) with its previous version (1.0) as well as several state-of-the-art unsupervised WSD algorithms on six benchmarks: SemEval 2007, Senseval-2, Senseval-3, SemEval 2013, SemEval 2015, and overall (unified). We demonstrate that ShotgunWSD 2.0 yields better performance than ShotgunWSD 1.0 and some other recent unsupervised or knowledge-based approaches. We also performed paired McNemar's significance tests, showing that the improvements of ShotgunWSD 2.0 over ShotgunWSD 1.0 are in most cases statistically significant, with a confidence interval of 0.01.

Highlights

Word Sense Disambiguation (WSD) is a core problem studied in the Natural Language Processing (NLP) community
We present an improved version of a recently introduced WSD algorithm [25], termed ShotgunWSD,1 which stems from the Shotgun genome sequencing technique [26], [27]
We propose a third approach which leads to an improved algorithm termed ShotgunWSD 2.0

Summary

Introduction

Word Sense Disambiguation (WSD) is a core problem studied in the Natural Language Processing (NLP) community. WSD refers to the task of identifying which sense of a word is used in a given context. Most of the existing WSD algorithms [7], [8] are usually divided into supervised, unsupervised, and knowledge-based techniques. Hybrid methods, e.g. unsupervised and knowledge-based, have been proposed in the literature [9]. Among these, supervised methods have reached the best disambiguation results [10], [11], but their main disadvantage is that they need large amounts of labeled examples for the supervised learning stage

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ShotgunWSD 2.0: An Improved Algorithm for Global Word Sense Disambiguation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing
Andrei Butnaru ... Radu Tudor Ionescu
-
Andrei Butnaru, et. al.Andrei Butnaru ... Radu Tudor Ionescu
01 Jan 2017
01 Jan 2017

Global Word Sense Disambiguation of Polysemous Words in Telugu Language
Suneetha Eluri ... Vasu Kumar Pilli
International Journal of Engineering and Advanced Technology | VOL. 10
Suneetha Eluri, et. al.Suneetha Eluri ... Vasu Kumar Pilli
30 Oct 2020
International Journal of Engineering and Advanced Technology | VOL. 10

Word Sense Disambiguation
Pushpak Bhattacharyya ... Mitesh Khapra
-
Pushpak Bhattacharyya, et. al.Pushpak Bhattacharyya ... Mitesh Khapra
01 Jan 2013
01 Jan 2013

Diffused Label Propagation based Transductive Classification Algorithm for Word Sense Disambiguation
Gokhan Kocaman ... Bilge Sipal
-
Gokhan Kocaman, et. al.Gokhan Kocaman ... Bilge Sipal
01 Jul 2019
01 Jul 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ShotgunWSD 2.0: An Improved Algorithm for Global Word Sense Disambiguation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access