Abstract

Existing studies for multi-source neural machine translation (NMT) either separately model different source sentences or resort to the conventional single-source NMT by simply concatenating all source sentences. However, there exist two drawbacks in these approaches. First, they ignore the explicit word-level semantic interactions between source sentences, which have been shown effective in the embeddings of multilingual texts. Second, multiple source sentences are simultaneously encoded by an NMT model, which is unable to fully exploit the semantic information of each source sentence. In this paper, we explore multi-stage information interactions for multi-source NMT. Specifically, we first propose a multi-source NMT model that performs information interactions at the encoding stage. Its encoder contains multiple semantic interaction layers, each of which sequentially consists of (1) monolingual semantic interaction sub-layer, which is based on the self-attention mechanism and used to learn word-level monolingual contextual representations of source sentences, and (2) cross-lingual semantic interaction sub-layer, which leverages word alignments to perform fine-grained semantic transitions among hidden states of different source sentences. Furthermore, at the training stage, we introduce a mutual distillation based training framework, where single-source models and ours perform information interactions. Such framework can fully exploit the semantic information of each source sentence to enhance our model. Extensive experimental results on the WMT14 English-German-French dataset show our method exhibits significant improvements upon competitive baselines.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call