Capsule Network Improved Multi-Head Attention for Word Sense Disambiguation

Jinfeng Cheng,Weian Yan,Weiqin Tong

doi:10.3390/app11062488

Abstract

Word sense disambiguation (WSD) is one of the core problems in natural language processing (NLP), which is to map an ambiguous word to its correct meaning in a specific context. There has been a lively interest in incorporating sense definition (gloss) into neural networks in recent studies, which makes great contribution to improving the performance of WSD. However, disambiguating polysemes of rare senses is still hard. In this paper, while taking gloss into consideration, we further improve the performance of the WSD system from the perspective of semantic representation. We encode the context and sense glosses of the target polysemy independently using encoders with the same structure. To obtain a better presentation in each encoder, we leverage the capsule network to capture different important information contained in multi-head attention. We finally choose the gloss representation closest to the context representation of the target word as its correct sense. We do experiments on English all-words WSD task. Experimental results show that our method achieves good performance, especially having an inspiring effect on disambiguating words of rare senses.

Highlights

Word sense disambiguation (WSD) with the ability to select the correct meaning of polysemous words depending on its language surroundings, has been considered one of the most difficult tasks in artificial intelligence [1]
Some scholars have revealed its positive impact on improving the performance of downstream natural language processing (NLP) tasks, i.e., information retrieval [2], machine translation [3,4], sentiment analysis [5], etc
The sequence routing (SR) and head routing (HR) alone can improve less frequent senses (LFS) performance with F1-score on most frequent sense (MFS) subtly decreased

Summary

Introduction

Word sense disambiguation (WSD) with the ability to select the correct meaning of polysemous words depending on its language surroundings, has been considered one of the most difficult tasks in artificial intelligence [1]. Pre-trained models e.g., Context2Vec [12], ELMo [13], and BERT [14], have shown effectiveness on improving downstream NLP tasks In this way, NLP task is to some extent divided into two parts: pretrain model to generate contextualized word representations and fine-tune model on downstream specific NLP task or directly use the pretrained word embedding. A great number of other neural-based methods using a neural network encoder to extract features are proposed [17,18,19,20,21].

Related Work

Capsule Network

Multi-Head Attention

All‐Words Task Definition

Experimental Setup

WSD on Rare Words and Rare Senses

Abaltion Study

Discussion

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Capsule Network Improved Multi-Head Attention for Word Sense Disambiguation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Journal: Applied Sciences	Publication Date: Mar 10, 2021
License type: CC BY 4.0

Similar Papers

Contextual word sense tuning and disambiguation
Michelangelo Della Rocca ... Roberto Basili
Applied Artificial Intelligence | VOL. 11
Michelangelo Della Rocca, et. al.Michelangelo Della Rocca ... Roberto Basili
01 Apr 1997
Applied Artificial Intelligence | VOL. 11

GRAPH-BASED METHODS FOR LANGUAGE PROCESSING AND INFORMATION RETRIEVAL
Dragomir Radev
-
Dragomir RadevDragomir Radev
01 Jan 2006
01 Jan 2006

A Probabilistic Model Based on n-Grams for Bilingual Word Sense Disambiguation
Beatriz Beltrán ... Mireya Tovar
-
Beatriz Beltrán, et. al.Beatriz Beltrán ... Mireya Tovar
01 Jan 2009
01 Jan 2009

A Naïve Bayes Approach to Cross-Lingual Word Sense Disambiguation and Lexical Substitution
Darnes Vilariño ... David Pinto
-
Darnes Vilariño, et. al.Darnes Vilariño ... David Pinto
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Capsule Network Improved Multi-Head Attention for Word Sense Disambiguation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences