An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation

Gongbo Tang,Rico Sennrich,Joakim Nivre

doi:10.18653/v1/w18-6304

Gongbo Tang, Rico Sennrich + Show 1 more

Open Access

PDF Available

https://doi.org/10.18653/v1/w18-6304

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2018
Citations: 97	License type: cc-by

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Recent work has shown that the encoder-decoder attention mechanisms in neural machine translation (NMT) are different from the word alignment in statistical machine translation. In this paper, we focus on analyzing encoder-decoder attention mechanisms, in the case of word sense disambiguation (WSD) in NMT models. We hypothesize that attention mechanisms pay more attention to context tokens when translating ambiguous words. We explore the attention distribution patterns when translating ambiguous nouns. Counterintuitively, we find that attention mechanisms are likely to distribute more attention to the ambiguous noun itself rather than context tokens, in comparison to other nouns. We conclude that attention is not the main mechanism used by NMT models to incorporate contextual information for WSD. The experimental results suggest that NMT models learn to encode contextual information necessary for WSD in the encoder hidden states. For the attention mechanism in Transformer models, we reveal that the first few layers gradually learn to “align” source and target tokens and the last few layers learn to extract features from the related but unaligned context tokens.

Highlights

Human languages exhibit many different types of ambiguity
We conclude that encoder-decoder attention is not the main mechanism used by neural machine translation (NMT) models to incorporate contextual information for Word sense disambiguation (WSD)
We assume that the contextual information has already been encoded into the hidden states by the encoder, and attention mechanisms do not learn which source words are useful for WSD

Summary

Introduction

Human languages exhibit many different types of ambiguity. Lexical ambiguity refers to the fact that words can have more than one semantic meaning. We focus on the question of how encoder-decoder attention mechanisms deal with ambiguous nouns. In this setting, we expect to get a more accurate picture of the WSD performance of NMT models. We hypothesize that attention mechanisms distribute more attention to context tokens when translating. We explore the relation between accuracy and attention distributions when translating ambiguous nouns. We conclude that encoder-decoder attention is not the main mechanism used by NMT models to incorporate contextual information for WSD. It learns to capture features from the related but unaligned source context tokens

Related Work

Attention Mechanisms

Evaluation

C2 W1 C3 W2 Drop

Experimental Settings

Results

Ambiguous Nouns in Attentional NMT

Hypothesis and Tests

Analysis

Accuracy and Attention Weights

Error Distribution

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Combining SMT and NMT Back-Translated Data for Efficient NMT
Alberto Poncelas ... Dimitar Shterionov
-
Alberto Poncelas, et. al.Alberto Poncelas ... Dimitar Shterionov
22 Oct 2019
22 Oct 2019

A Pragmatic Analysis of Machine Translation Techniques for Preserving the Authenticity of the Sanskrit Language
Nandini Sethi ... Deepak Kumar Sharma
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. -
Nandini Sethi, et. al.Nandini Sethi ... Deepak Kumar Sharma
25 Jul 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. -

Research on the Construction of a Bidirectional Neural Network Machine Translation Model Fused with Attention Mechanism
Guangming Zuo
Mathematical Problems in Engineering | VOL. 2022
Guangming ZuoGuangming Zuo
19 Aug 2022
Mathematical Problems in Engineering | VOL. 2022

Baidu Translate: Research and Products
Zhongjun He
-
Zhongjun HeZhongjun He
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers