Incorporating Noisy Length Constraints into Transformer with Length-aware Positional Encodings

Yui Oka,Katsuki Chousa,Satoshi Nakamura,Katsuhito Sudoh

doi:10.18653/v1/2020.coling-main.319

Abstract

Neural Machine Translation often suffers from an under-translation problem due to its limited modeling of output sequence lengths. In this work, we propose a novel approach to training a Transformer model using length constraints based on length-aware positional encoding (PE). Since length constraints with exact target sentence lengths degrade translation performance, we add random noise within a certain window size to the length constraints in the PE during the training. In the inference step, we predict the output lengths using input sequences and a BERT-based length prediction model. Experimental results in an ASPEC English-to-Japanese translation showed the proposed method produced translations with lengths close to the reference ones and outperformed a vanilla Transformer (especially in short sentences) by 3.22 points in BLEU. The average translation results using our length prediction model were also better than another baseline method using input lengths for the length constraints. The proposed noise injection improved robustness for length prediction errors, especially within the window size.

Highlights

In autoregressive Neural Machine Translation (NMT), a decoder generates one token at a time, and each output token depends on the output tokens generated so far
Lakew et al (2019) applied length-difference positional encoding (LDPE) and length-ratio positional encoding (LRPE) to NMT. They trained an NMT model using output length constraints based on LDPE and LRPE along with special tokens representing length ratio classes between input and output sentences, while they used the input sentence length at the inference time
We propose an NMT method based on LRPE and LDPE with a BERT-based output length prediction

Summary

Introduction

In autoregressive Neural Machine Translation (NMT), a decoder generates one token at a time, and each output token depends on the output tokens generated so far. The decoder’s prediction of the end of the sentence determines the length of the output sentence. This prediction is sometimes made too early– before all of the input information is translated–causing a so-called under-translation. Takase and Okazaki (2019) proposed two variants of length-aware positional encodings called length-ratio positional encoding (LRPE) and length-difference positional encoding (LDPE) to control the output length based on the given length constraints in automatic summarization. Lakew et al (2019) applied LDPE and LRPE to NMT They trained an NMT model using output length constraints based on LDPE and LRPE along with special tokens representing length ratio classes between input and output sentences, while they used the input sentence length at the inference time. The length of an input sentence is not a reliable estimator of the output length, because the actual output length varies with the content of the input

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Incorporating Noisy Length Constraints into Transformer with Length-aware Positional Encodings

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 5	License type: cc-by

Similar Papers

Multilingual Neural Translation

-

14 Feb 2020
14 Feb 2020

Challenges of Neural Machine Translation for Short Texts
...
-
, et. al. ...
07 May 2022
07 May 2022

Challenges of Neural Machine Translation for Short Texts
Yu Wan ... Derek Fai Wong
Computational Linguistics | VOL. 48
Yu Wan, et. al.Yu Wan ... Derek Fai Wong
09 Jun 2022
Computational Linguistics | VOL. 48

Syntax-aware Transformer Encoder for Neural Machine Translation
Sufeng Duan ... Hai Zhao
-
Sufeng Duan, et. al.Sufeng Duan ... Hai Zhao
01 Nov 2019
01 Nov 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Incorporating Noisy Length Constraints into Transformer with Length-aware Positional Encodings

Abstract

Highlights

Summary

Talk to us

Similar Papers