Leveraging Gloss Knowledge in Neural Word Sense Disambiguation by Hierarchical Co-Attention

Fuli Luo,Tianyu Liu,Zhifang Sui,Baobao Chang,Zexue He,Qiaolin Xia

doi:10.18653/v1/d18-1170

Abstract

The goal of Word Sense Disambiguation (WSD) is to identify the correct meaning of a word in the particular context. Traditional supervised methods only use labeled data (context), while missing rich lexical knowledge such as the gloss which defines the meaning of a word sense. Recent studies have shown that incorporating glosses into neural networks for WSD has made significant improvement. However, the previous models usually build the context representation and gloss representation separately. In this paper, we find that the learning for the context and gloss representation can benefit from each other. Gloss can help to highlight the important words in the context, thus building a better context representation. Context can also help to locate the key words in the gloss of the correct word sense. Therefore, we introduce a co-attention mechanism to generate co-dependent representations for the context and gloss. Furthermore, in order to capture both word-level and sentence-level information, we extend the attention mechanism in a hierarchical fashion. Experimental results show that our model achieves the state-of-the-art results on several standard English all-words WSD test datasets.

Highlights

Word Sense Disambiguation (WSD) is a crucial task and long-standing problem in Natural Language Processing (NLP)
Supervised feature-based methods (Zhi and Ng, 2010; Iacobacci et al, 2016) and neural-based methods (Kageback and Salomonsson, 2016; Raganato et al, 2017a) usually use labeled data to train one or more classifiers. As they often play football together, they know each other quite well g1: participate in games or sports g2: perform music on an instrument g3: behave in a certain way both lexical knowledge and labeled data are of great help for WSD, previous supervised methods rarely take the integration of knowledge into consideration
We conduct a series of experiments, which show that our models outperform the state-ofthe-art systems on several standard English all-words WSD test datasets

Summary

Introduction

Word Sense Disambiguation (WSD) is a crucial task and long-standing problem in Natural Language Processing (NLP). As they often play football together, they know each other quite well g1: participate in games or sports g2: perform music on an instrument g3: behave in a certain way both lexical knowledge (especially gloss) and labeled data are of great help for WSD, previous supervised methods rarely take the integration of knowledge into consideration. To the best of our knowledge, Luo et al (2018) are the first to directly incorporate the gloss knowledge from WordNet into a unified neural network for WSD. This model separately builds the context representation and the gloss representation as distributed vectors and later calculates their similarity in a memory network. We introduce a co-attention mechanism to model the mutual influence between the representations of context and gloss

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Leveraging Gloss Knowledge in Neural Word Sense Disambiguation by Hierarchical Co-Attention

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2018
Citations: 76	License type: cc-by

Similar Papers

Word vs. Class-Based Word Sense Disambiguation
Ruben Izquierdo ... German Rigau
Journal of Artificial Intelligence Research | VOL. 54
Ruben Izquierdo, et. al.Ruben Izquierdo ... German Rigau
09 Sep 2015
Journal of Artificial Intelligence Research | VOL. 54

Word Sense Disambiguation Method Based on Improved Mutual Information with Wikipedia Extend
Feiyue Ye ... Yulong Zhu
-
Feiyue Ye, et. al.Feiyue Ye ... Yulong Zhu
01 Oct 2015
01 Oct 2015

Word sense disambiguation using PolyWordNet
Udaya Raj Dhungana ... Subarna Shakya
-
Udaya Raj Dhungana, et. al.Udaya Raj Dhungana ... Subarna Shakya
01 Aug 2016
01 Aug 2016

Unsupervised Approach to Word Sense Disambiguation in Malayalam
K.P Sruthi Sankar ... V Jayan
Procedia Technology | VOL. 24
K.P Sruthi Sankar, et. al.K.P Sruthi Sankar ... V Jayan
01 Jan 2015
Procedia Technology | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Leveraging Gloss Knowledge in Neural Word Sense Disambiguation by Hierarchical Co-Attention

Abstract

Highlights

Summary

Talk to us

Similar Papers