Target Foresight Based Attention for Neural Machine Translation

Xintong Li,Max Meng,Zhaopeng Tu,Lemao Liu,Shuming Shi

doi:10.18653/v1/n18-1125

Abstract

In neural machine translation, an attention model is used to identify the aligned source words for a target word (target foresight word) in order to select translation context, but it does not make use of any information of this target foresight word at all. Previous work proposed an approach to improve the attention model by explicitly accessing this target foresight word and demonstrated the substantial gains in alignment task. However, this approach is useless in machine translation task on which the target foresight word is unavailable. In this paper, we propose a new attention model enhanced by the implicit information of target foresight word oriented to both alignment and translation tasks. Empirical experiments on Chinese-to-English and Japanese-to-English datasets show that the proposed attention model delivers significant improvements in terms of both alignment error rate and BLEU.

Highlights

Since neural machine translation (NMT) was proposed (Bahdanau et al, 2014), it has been attracted increasing interests in machine translation community (Luong et al, 2015b; Tu et al, 2016; Feng et al, 2016; Cohn et al, 2016)
In order to alleviate the issue of inadequate modeling for attention in NMT, we propose the target foresight attention for NMT, which foresees some related information of the unknown target foresight word to improve its alignments regarding to source words
We investigate which category of generated words benefit most from the proposed approach in terms of alignments measured by alignment error rate (AER) (Och, 2003)

Summary

Introduction

Since neural machine translation (NMT) was proposed (Bahdanau et al, 2014), it has been attracted increasing interests in machine translation community (Luong et al, 2015b; Tu et al, 2016; Feng et al, 2016; Cohn et al, 2016). Compared with traditional statistical machine translation (Koehn et al, 2003; Chiang, 2005), one of advantages in NMT is that its architecture combines language model, translation model and alignment between source and target words in a unified manner rather than a fă guó shī yè rén shù zài dù huí shēng ‫ ݓم‬ാြ ದඔ ᄜ؇ ߭ശ . In NMT, the attention mechanism plays an important role It calculates the alignments of a target word with respect to the source words for translation context selection. Yn} with length n, neural machine translation aims to model the conditional probability P (y | x):. An encoder reads the source sentence x into a sequence of representation vectors by a bidirectional recurrent neural network. A decoder sequentially generates a target word according to P (yi | y

Objectives

Methods

Results

Conclusion