Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

Pan Xie,Xiaohui Hu,Zhi Cui,Jianwei Cui,Xiuying Chen,Bin Wang

doi:10.18653/v1/2020.coling-main.2

Abstract

Non-autoregressive models generate target words in a parallel way, which achieve a faster decoding speed but at the sacrifice of translation accuracy. To remedy a flawed translation by non-autoregressive models, a promising approach is to train a conditional masked translation model (CMTM), and refine the generated results within several iterations. Unfortunately, such approach hardly considers the \textit{sequential dependency} among target words, which inevitably results in a translation degradation. Hence, instead of solely training a Transformer-based CMTM, we propose a Self-Review Mechanism to infuse sequential information into it. Concretely, we insert a left-to-right mask to the same decoder of CMTM, and then induce it to autoregressively review whether each generated word from CMTM is supposed to be replaced or kept. The experimental results (WMT14 En$\leftrightarrow$De and WMT16 En$\leftrightarrow$Ro) demonstrate that our model uses dramatically less training computations than the typical CMTM, as well as outperforms several state-of-the-art non-autoregressive models by over 1 BLEU. Through knowledge distillation, our model even surpasses a typical left-to-right Transformer model, while significantly speeding up decoding.

Highlights

Neural Machine Translation (NMT) models have achieved a great success in recent years (Sutskever et al, 2014; Bahdanau et al, 2015; Cho et al, 2014; Kalchbrenner et al, 2016; Gehring et al, 2017; Vaswani et al, 2017)
Given a source sentence x = {x1, x2, ..., x|x|}, a NMT model is aimed to generate a sentence in target language y = {y1, y2, ..., y|y|} with identical semantics expressed, where |x| and |y| are denoted as the length of source and target sentence, respectively
We identify the drawback of conditional masked translation modeling (CMTM) that it is insufficient to capture the sequential correlations among target words

Summary

Introduction

Neural Machine Translation (NMT) models have achieved a great success in recent years (Sutskever et al, 2014; Bahdanau et al, 2015; Cho et al, 2014; Kalchbrenner et al, 2016; Gehring et al, 2017; Vaswani et al, 2017). NMTs use autoregressive decoders, where the words are generated one-by-one. Despite the acceleration of computation efficiency, these models usually suffers from the cost of translation accuracy Even worse, they decode a target only in one shot, and miss a chance to remedy a flawed translation. The training objective of an autogressive NMT model is expressed as a chain of conditional probabilities in a left-to-right manner:. T=1 where y0 and y|y|+1 are and , standing for the start and end of a sentence, respectively These probabilities are parameterized using a standard encoder-decoder architecture (Sutskever et al, 2014), where the decoders use autoregressive strategy to capture the left-to-right dependency among the target words. Different from the training objective, we adopt conditional masked translation modeling (CMTM) (Ghazvininejad et al, 2019) to optimize our proposed non-autoregressive NMT model. Based on the assumption that the words of ymask are independent, the training objective of CMTM is formulated as: Length Predict the cat is cool Softmax Linear

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 4	License type: cc-by

Similar Papers

Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

-

25 Nov 2020
25 Nov 2020

A Study of Non-autoregressive Model for Sequence Generation
Yi Ren ... sheng zhao
-
Yi Ren, et. al.Yi Ren ... sheng zhao
01 Jan 2020
01 Jan 2020

Non-Autoregressive Machine Translation: It's Not as Fast as it Seems
...
-
, et. al. ...
29 Jun 2022
29 Jun 2022

How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation?
Weijia Xu ... Marine Carpuat
-
Weijia Xu, et. al.Weijia Xu ... Marine Carpuat
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

Abstract

Highlights

Summary

Talk to us

Similar Papers