Abstract

Non-autoregressive translation (NAT) has become a hot direction for its acceleration on decoding. Conditional masked language model (CMLM) performs excellently in NAT models. We review and extend the CMLM in some strategy: (1) N-gram mask strategy, which can help model to learn coarse sematic information of target language; (2) top-k decoding strategy, our model generates the top-k probability words in each step so that it can generate the final sentence in constant number steps. Extensive experiments demonstrate that our method is progressive compared with CMLM and some other NAT models. Specially, on the dataset WMT 14 EN-DE, our approach can achieve 27.24 BLEU score with only 0.1 BLEU sacrifice compared with the autoregressive counterpart base Transformer while speeding up 3 times on decoding.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call