Abstract

In non-autoregressive machine translation, the target tokens are generated by the decoder in one shot. Although this decoding process can significantly reduce the decoding latency, non-autoregressive machine translation still suffers from the sacrifice of translation accuracy. We argue that the reason for such decrease is the lack of the target dependencies, history and future information, between target tokens. So, in this work, we propose a novel method to address this problem. We suppose the hidden representation of a target token from the decoder should consist of three parts: history, present, and future information. And we dynamically aggregate such parts-to-whole information with capsule network for the decoder to improve the performance of non-autoregressive machine translation. In addition, to ensure the capsules learn the information as we expect, we introduce an autoregressive decoder. Several experiments on benchmark tasks demonstrate that the explicit modeling of history and future information can significantly improve performance of NAT model. Extensive analyses show that our model is able to learn history and future information as we expect.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call