A Document-Level Neural Machine Translation Model with Dynamic Caching Guided by Theme-Rheme Information

Yiqi Tong,Hongkang Zhu,Xiaodong Shi,Yidong Chen,Jiangbin Zheng

doi:10.18653/v1/2020.coling-main.388

Abstract

Research on document-level Neural Machine Translation (NMT) models has attracted increasing attention in recent years. Although the proposed works have proved that the inter-sentence information is helpful for improving the performance of the NMT models, what information should be regarded as context remains ambiguous. To solve this problem, we proposed a novel cache-based document-level NMT model which conducts dynamic caching guided by theme-rheme information. The experiments on NIST evaluation sets demonstrate that our proposed model achieves substantial improvements over the state-of-the-art baseline NMT models. As far as we know, we are the first to introduce theme-rheme theory into the field of machine translation.

Highlights

Most state-of-the-art Neural Machine Translation (NMT) models (Bahdanau et al, 2014; Sutskever et al, 2014; Vaswani et al, 2017) regard independent sentence pairs as their training and decoding units without considering the document-level context
Due to the ignorance of the discourse connections between sentences and other valuable contextual information like coreference, translations produced by such NMT systems tend to be problematic in coherence and cohesion, e.g. inconsistent translations of the same words, under-translation or mistranslation of the topic words, etc. (Hardmeier, 2012; Meyer and Webber, 2013; Smith, 2017)
We propose a cache-based NMT model with a dynamic cache that capturing the source-side intersentence information

Summary

Introduction

Most state-of-the-art Neural Machine Translation (NMT) models (Bahdanau et al, 2014; Sutskever et al, 2014; Vaswani et al, 2017) regard independent sentence pairs as their training and decoding units without considering the document-level context. Recent studies (Weston et al, 2014; Maruf and Haffari, 2017; Su et al, 2018) introduce an external architecture to produce contextual representation during the translation of a sentence. Tu et al (2018) and Kuang and Xiong (2018) propose cache-based NMT models to capture document-level information In these models, one can define flexible caching rules so that the stored information may be more interpretable. One can define flexible caching rules so that the stored information may be more interpretable These methods usually focus on the target-side context and may suffer from the problem of error propagation, since the target-side context that is used as cache often contains translation errors. These two methods require high-quality and large-scale parallel corpus with document boundaries, which are seldom available

Methods

Results

Conclusion