Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation

Xiao Pu,Nikolaos Pappas,James Henderson,Andrei Popescu-Belis

doi:10.1162/tacl_a_00242

Abstract

This paper demonstrates that word sense disambiguation (WSD) can improve neural machine translation (NMT) by widening the source context considered when modeling the senses of potentially ambiguous words. We first introduce three adaptive clustering algorithms for WSD, based on k-means, Chinese restaurant processes, and random walks, which are then applied to large word contexts represented in a low-rank space and evaluated on SemEval shared-task data. We then learn word vectors jointly with sense vectors defined by our best WSD method, within a state-of-the-art NMT system. We show that the concatenation of these vectors, and the use of a sense selection mechanism based on the weighted average of sense vectors, outperforms several baselines including sense-aware ones. This is demonstrated by translation on five language pairs. The improvements are more than 1 BLEU point over strong NMT baselines, +4% accuracy over all ambiguous nouns and verbs, or +20% when scored manually over several challenging words.

Highlights

The correct translation of polysemous words remains a challenge for machine translation (MT)
The best hyper-parameters are those found above, for each of the word sense disambiguation (WSD)+neural MT (NMT) combination strategies, in particular the k-means method for WSD+SMT, and the ATT Model with Initialization of Embeddings (ATTini) method for WSD+NMT—that is, the attention-based model of senses initialized with the output of k-means clustering
To demonstrate that our findings generalize to larger data sets, we report results on three data sets provided by the Workshop on Statistical Machine Translation (WMT) conference, namely, for EN/DE, EN/ES and EN/FR

Summary

Introduction

The correct translation of polysemous words remains a challenge for machine translation (MT). We demonstrate that the explicit modeling of word senses can be helpful to NMT by using combined vector representations of word types and senses, which are inferred from contexts that are larger than that of state-of-the-art NMT systems. Supervised word sense disambiguation (WSD) approaches integrated into NMT, based on three adaptive clustering methods and operating on large word contexts. Three sense selection mechanisms for integrating WSD into NMT, respectively based on top, average, and weighted average (i.e., attention) of word senses.

Adaptive Sense Clustering for MT

Definitions and Notations

Clustering Word Occurrences by Sense

Baseline Neural MT Model

Sense-aware Neural MT Models

Best WSD Method Based on BLEU

Results

Related Work

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Dec 1, 2018
Citations: 49	License type: cc-by

R Discovery Prime

R Discovery Prime

Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation
...
Zenodo (CERN European Organization for Nuclear Research) | VOL. -
, et. al. ...
05 Oct 2018
Zenodo (CERN European Organization for Nuclear Research) | VOL. -

Baidu Translate: Research and Products
Zhongjun He
-
Zhongjun HeZhongjun He
01 Jan 2015
01 Jan 2015

Iterative Training of Unsupervised Neural and Statistical Machine Translation Systems
Benjamin Marie ... Atsushi Fujita
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19
Benjamin Marie, et. al.Benjamin Marie ... Atsushi Fujita
01 Jun 2020
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19

Checkpoint Reranking: An Approach to Select Better Hypothesis for Neural Machine Translation Systems
Vinay Pandramish ... Dipti Misra Sharma
-
Vinay Pandramish, et. al.Vinay Pandramish ... Dipti Misra Sharma
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics