Abstract

This paper demonstrates that word sense disambiguation (WSD) can improve neural machine translation (NMT) by widening the source context considered when modeling the senses of potentially ambiguous words. We first introduce three adaptive clustering algorithms for WSD, based on k-means, Chinese restaurant processes, and random walks, which are then applied to large word contexts represented in a low-rank space and evaluated on SemEval shared-task data. We then learn word vectors jointly with sense vectors defined by our best WSD method, within a state-of-the-art NMT system. We show that the concatenation of these vectors, and the use of a sense selection mechanism based on the weighted average of sense vectors, outperforms several baselines including sense-aware ones. This is demonstrated by translation on five language pairs. The improvements are more than 1 BLEU point over strong NMT baselines, +4% accuracy over all ambiguous nouns and verbs, or +20% when scored manually over several challenging words.

Highlights

  • The correct translation of polysemous words remains a challenge for machine translation (MT)

  • The best hyper-parameters are those found above, for each of the word sense disambiguation (WSD)+neural MT (NMT) combination strategies, in particular the k-means method for WSD+SMT, and the ATT Model with Initialization of Embeddings (ATTini) method for WSD+NMT—that is, the attention-based model of senses initialized with the output of k-means clustering

  • To demonstrate that our findings generalize to larger data sets, we report results on three data sets provided by the Workshop on Statistical Machine Translation (WMT) conference, namely, for EN/DE, EN/ES and EN/FR

Read more

Summary

Introduction

The correct translation of polysemous words remains a challenge for machine translation (MT). We demonstrate that the explicit modeling of word senses can be helpful to NMT by using combined vector representations of word types and senses, which are inferred from contexts that are larger than that of state-of-the-art NMT systems. Supervised word sense disambiguation (WSD) approaches integrated into NMT, based on three adaptive clustering methods and operating on large word contexts. Three sense selection mechanisms for integrating WSD into NMT, respectively based on top, average, and weighted average (i.e., attention) of word senses.

Adaptive Sense Clustering for MT
Definitions and Notations
Clustering Word Occurrences by Sense
Baseline Neural MT Model
Sense-aware Neural MT Models
Best WSD Method Based on BLEU
Results
Related Work
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.