Abstract

In non-autoregressive machine translation, the target tokens are generated by the decoder in one shot. Although this decoding process can significantly reduce the decoding latency, non-autoregressive machine translation still suffers from the sacrifice of translation accuracy. We argue that the reason for such decrease is the lack of the target dependencies, history and future information, between target tokens. So, in this work, we propose a novel method to address this problem. We suppose the hidden representation of a target token from the decoder should consist of three parts: history, present, and future information. And we dynamically aggregate such parts-to-whole information with capsule network for the decoder to improve the performance of non-autoregressive machine translation. In addition, to ensure the capsules learn the information as we expect, we introduce an autoregressive decoder. Several experiments on benchmark tasks demonstrate that the explicit modeling of history and future information can significantly improve performance of NAT model. Extensive analyses show that our model is able to learn history and future information as we expect.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.