On the Word Alignment from Neural Machine Translation

Xintong Li,Guanlin Li,Max Meng,Shuming Shi,Lemao Liu

doi:10.18653/v1/p19-1124

Abstract

Prior researches suggest that neural machine translation (NMT) captures word alignment through its attention mechanism, however, this paper finds attention may almost fail to capture word alignment for some NMT models. This paper thereby proposes two methods to induce word alignment which are general and agnostic to specific NMT models. Experiments show that both methods induce much better word alignment than attention. This paper further visualizes the translation through the word alignment induced by NMT. In particular, it analyzes the effect of alignment errors on translation errors at word level and its quantitative analysis over many testing examples consistently demonstrate that alignment errors are likely to lead to translation errors measured by different metrics.

Highlights

Machine translation aims at modeling the semantic equivalence between a pair of source and target sentences (Koehn, 2009), and word alignment tries to model the semantic equivalence between a pair of source and target words (Och and Ney, 2003)
This paper makes the two-fold contributions:. It systematically studies word alignment from neural machine translation (NMT) and proposes two approaches to induce word alignment which are agnostic to specific NMT models
In order to induce word alignment from general NMT models, we propose two different methods, which are agnostic to specific NMT models

Summary

Introduction

Machine translation aims at modeling the semantic equivalence between a pair of source and target sentences (Koehn, 2009), and word alignment tries to model the semantic equivalence between a pair of source and target words (Och and Ney, 2003). Our experiments demonstrate that NMT captures good word alignment for those words mostly contributed from source (CFS), while their word alignment is much worse for those words mostly contributed from target (CFT). This finding offers a reason why advanced NMT models delivering excellent translation capture worse word alignment than statistical aligners in SMT, which was observed in prior researches yet without deep explanation (Tu et al, 2016; Liu et al, 2016)

Objectives

Methods

Results

Conclusion