Abstract

Neural Machine Translation often suffers from an under-translation problem due to its limited modeling of output sequence lengths. In this work, we propose a novel approach to training a Transformer model using length constraints based on length-aware positional encoding (PE). Since length constraints with exact target sentence lengths degrade translation performance, we add random noise within a certain window size to the length constraints in the PE during the training. In the inference step, we predict the output lengths using input sequences and a BERT-based length prediction model. Experimental results in an ASPEC English-to-Japanese translation showed the proposed method produced translations with lengths close to the reference ones and outperformed a vanilla Transformer (especially in short sentences) by 3.22 points in BLEU. The average translation results using our length prediction model were also better than another baseline method using input lengths for the length constraints. The proposed noise injection improved robustness for length prediction errors, especially within the window size.

Highlights

  • In autoregressive Neural Machine Translation (NMT), a decoder generates one token at a time, and each output token depends on the output tokens generated so far

  • Lakew et al (2019) applied length-difference positional encoding (LDPE) and length-ratio positional encoding (LRPE) to NMT. They trained an NMT model using output length constraints based on LDPE and LRPE along with special tokens representing length ratio classes between input and output sentences, while they used the input sentence length at the inference time

  • We propose an NMT method based on LRPE and LDPE with a BERT-based output length prediction

Read more

Summary

Introduction

In autoregressive Neural Machine Translation (NMT), a decoder generates one token at a time, and each output token depends on the output tokens generated so far. The decoder’s prediction of the end of the sentence determines the length of the output sentence. This prediction is sometimes made too early– before all of the input information is translated–causing a so-called under-translation. Takase and Okazaki (2019) proposed two variants of length-aware positional encodings called length-ratio positional encoding (LRPE) and length-difference positional encoding (LDPE) to control the output length based on the given length constraints in automatic summarization. Lakew et al (2019) applied LDPE and LRPE to NMT They trained an NMT model using output length constraints based on LDPE and LRPE along with special tokens representing length ratio classes between input and output sentences, while they used the input sentence length at the inference time. The length of an input sentence is not a reliable estimator of the output length, because the actual output length varies with the content of the input

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call