Abstract

In recent years, non-autoregressive machine translation has achieved great success due to its promising inference speedup. Non-autoregressive machine translation reduces the decoding latency by generating the target words in single-pass. However, there is a considerable gap in the accuracy between non-autoregressive machine translation and autoregressive machine translation. Because it removes the dependencies between the target words, non-autoregressive machine translation tends to generate repetitive words or wrong words, and these repetitive or wrong words lead to low performance. In this paper, we introduce a soft-masking method to alleviate this issue. Specifically, we introduce an autoregressive discriminator, which will output the probabilities hinting which embeddings are correct. Then according to the probabilities, we add mask on the copied representations, which enables the model to consider which words are easy to be predicted. We evaluated our method on three benchmarks, including WMT14 EN \(\rightarrow \) DE, WMT16 EN \(\rightarrow \) RO, and IWSLT14 DE \(\rightarrow \) EN. The experimental results demonstrate that our method can outperform the baseline by a large margin with a bit of speed sacrifice.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call