ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing

Andrei Butnaru,Florentina Hristea,Radu Tudor Ionescu

doi:10.18653/v1/e17-1086

Abstract

In this paper, we present a novel unsupervised algorithm for word sense disambiguation (WSD) at the document level. Our algorithm is inspired by a widely-used approach in the field of genetics for whole genome sequencing, known as the Shotgun sequencing technique. The proposed WSD algorithm is based on three main steps. First, a brute-force WSD algorithm is applied to short context windows (up to 10 words) selected from the document in order to generate a short list of likely sense configurations for each window. In the second step, these local sense configurations are assembled into longer composite configurations based on suffix and prefix matching. The resulted configurations are ranked by their length, and the sense of each word is chosen based on a voting scheme that considers only the top k configurations in which the word appears. We compare our algorithm with other state-of-the-art unsupervised WSD algorithms and demonstrate better performance, sometimes by a very large margin. We also show that our algorithm can yield better performance than the Most Common Sense (MCS) baseline on one data set. Moreover, our algorithm has a very small number of parameters, is robust to parameter tuning, and, unlike other bio-inspired methods, it gives a deterministic solution (it does not involve random choices).

Highlights

Word Sense Disambiguation (WSD), the task of identifying which sense of a word is used in a given context, is a core NLP problem, having the potential to improve many applications such as machine translation (Carpuat and Wu, 2007), text summarization (Plaza et al, 2011), information retrieval (Chifu and Ionescu, 2012; Chifu et al, 2014) or sentiment analysis (Sumanth and Inkpen, 2015)
We compare them with the Most Common Sense (MCS) baseline which is based on human annotations
By using sense embeddings in a completely different way than Bhingardive et al (2015), we are able to report an F1 score of 59.82%, which is much closer to the MCS baseline (62.30%)

Summary

Introduction

Word Sense Disambiguation (WSD), the task of identifying which sense of a word is used in a given context, is a core NLP problem, having the potential to improve many applications such as machine translation (Carpuat and Wu, 2007), text summarization (Plaza et al, 2011), information retrieval (Chifu and Ionescu, 2012; Chifu et al, 2014) or sentiment analysis (Sumanth and Inkpen, 2015). Most of the existing WSD algorithms (Agirre and Edmonds, 2006; Navigli, 2009) are commonly classified into supervised, unsupervised, and knowledge-based techniques, but hybrid approaches have been proposed in the literature (Hristea et al, 2008). The main disadvantage of supervised methods (that have led to the best disambiguation results) is that they require a large amount of annotated data which is difficult to obtain. We introduce a novel WSD algorithm, termed ShotgunWSD1, that stems from the Shotgun genome sequencing technique (Anderson, 1981; Istrail et al, 2004). Our WSD algorithm is unsupervised, but it requires knowledge in the form of WordNet (Miller, 1995; Fellbaum, 1998) synsets and relations as well. Our algorithm can be regarded as a hybrid approach

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2017
Citations: 43	License type: cc-by

Similar Papers

ShotgunWSD 2.0: An Improved Algorithm for Global Word Sense Disambiguation
Andrei M Butnaru ... Radu Tudor Ionescu
IEEE Access | VOL. 7
Andrei M Butnaru, et. al.Andrei M Butnaru ... Radu Tudor Ionescu
01 Jan 2019
IEEE Access | VOL. 7

Global Word Sense Disambiguation of Polysemous Words in Telugu Language
Suneetha Eluri ... Vasu Kumar Pilli
International Journal of Engineering and Advanced Technology | VOL. 10
Suneetha Eluri, et. al.Suneetha Eluri ... Vasu Kumar Pilli
30 Oct 2020
International Journal of Engineering and Advanced Technology | VOL. 10

Word Sense Disambiguation
Pushpak Bhattacharyya ... Mitesh Khapra
-
Pushpak Bhattacharyya, et. al.Pushpak Bhattacharyya ... Mitesh Khapra
01 Jan 2013
01 Jan 2013

Diffused Label Propagation based Transductive Classification Algorithm for Word Sense Disambiguation
Gokhan Kocaman ... Bilge Sipal
-
Gokhan Kocaman, et. al.Gokhan Kocaman ... Bilge Sipal
01 Jul 2019
01 Jul 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing

Abstract

Highlights

Summary

Talk to us

Similar Papers