Modeling Discourse Structure for Document-level Neural Machine Translation

Junxuan Chen,Xiang Li,Jiarui Zhang,Bin Wang,Jianwei Cui,Jinsong Su,Chulun Zhou

doi:10.18653/v1/2020.autosimtrans-1.5

Abstract

Recently, document-level neural machine translation (NMT) has become a hot topic in the community of machine translation. Despite its success, most of existing studies ignored the discourse structure information of the input document to be translated, which has shown effective in other tasks. In this paper, we propose to improve document-level NMT with the aid of discourse structure information. Our encoder is based on a hierarchical attention network (HAN) (Miculicich et al., 2018). Specifically, we first parse the input document to obtain its discourse structure. Then, we introduce a Transformer-based path encoder to embed the discourse structure information of each word. Finally, we combine the discourse structure information with the word embedding before it is fed into the encoder. Experimental results on the English-to-German dataset show that our model can significantly outperform both Transformer and Transformer+HAN.

Highlights

Neural machine translation (NMT) has made great progress in the past decade
We propose a novel document-level neural machine translation (NMT) model based on hierarchical attention network (HAN) (Miculicich et al, 2018)
Our model integrated with contextual information and discourse structure information further gains a better performance, 2.06 higher than Transformer and 0.39 higher than Transformer+HAN on BLEU, 2.9 lower than Transformer and 0.5 lower than Transformer+HAN on TER

Summary

Introduction

Neural machine translation (NMT) has made great progress in the past decade. In practical applications, the need for NMT systems has expanded from individual sentences to complete documents. Contextual information is important for obtaining highquality document translation. To get better contextual information, researchers have proposed many methods (e.g., memory network and hierarchical attention network) for document-level translation (Sim Smith, 2017; Tiedemann and Scherrer, 2017; Wang et al, 2017a; Tu et al, 2017; Wang et al., 2017a; Voita et al, 2018; Zhang et al, 2018; Miculicich et al, 2018; Maruf and Haffari, 2018; Maruf et al, 2019; Yang et al, 2019). As well as raw contextual sentences, is a major component of the document. To the best of our knowledge, discourse structure has not been explicitly used in document-level NMT

Methods

Results

Conclusion