Abstract

Neural machine translation uses a decoder to generate target words auto-regressively by predicting the next target word conditioned on a given source sentence and its previously predicted target words, i.e, its translation history, which suffers from two limitations: 1) the prediction of next word depends heavily on the quality of its history information. Moreover, the discrepancy between training and inference exacerbates this limitation; 2) this left-to-right decoding way cannot make full use of the target-side future information, which leads to the issue of unbalanced outputs. On the one hand, we alleviate the first limitation with a history-refining module, which learns to examine the quality of each history word by assigning it a confidence score. The confidence score is further used as a gate to control the amount of its word embedding flowing to the decoder. On the other hand, we attack the second limitation with a future-foreseeing module, which learns the distribution of future translation at each decoding time step. More importantly, we further propose refining history for future-aware NMT since the two modules can be closely incorporated as they focus on different kinds of context. Experimental results on various translation tasks with different scaled datasets, including WMT English <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\leftrightarrow$</tex-math></inline-formula> {German, French, Romanian}, show that our proposed approach achieves significant improvements over strong Transformer-based NMT baselines.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call