Enhancing Lexical Translation Consistency for Document-Level Neural Machine Translation
Document-level neural machine translation (DocNMT) has yielded attractive improvements. In this article, we systematically analyze the discourse phenomena in Chinese-to-English translation, and focus on the most obvious ones, namely lexical translation consistency. To alleviate the lexical inconsistency, we propose an effective approach that is aware of the words which need to be translated consistently and constrains the model to produce more consistent translations. Specifically, we first introduce a global context extractor to extract the document context and consistency context, respectively. Then, the two types of global context are integrated into a encoder enhancer and a decoder enhancer to improve the lexical translation consistency. We create a test set to evaluate the lexical consistency automatically. Experiments demonstrate that our approach can significantly alleviate the lexical translation inconsistency. In addition, our approach can also substantially improve the translation quality compared to sentence-level Transformer.
- Video Transcripts
- 10.48448/z1hh-w495
- May 7, 2022
- Underline Science Inc.
Document-level neural machine translation (DocNMT) achieves coherent translations by incorporating cross-sentence context. However, for most language pairs there’s a shortage of parallel documents, although parallel sentences are readily available. In this paper, we study whether and how contextual modeling in DocNMT is transferable via multilingual modeling. We focus on the scenario of zero-shot transfer from teacher languages with document level data to student languages with no documents but sentence level data, and for the first time treat document level translation as a transfer learning problem. Using simple concatenation-based DocNMT, we explore the effect of 3 factors on the transfer: the number of teacher languages with document level data, the balance between document and sentence level data at training, and the data condition of parallel documents (genuine vs. back translated). Our experiments on Europarl-7 and IWSLT-10 show the feasibility of multilingual transfer for DocNMT, particularly on document-specific metrics. We observe that more teacher languages and adequate data balance both contribute to better transfer quality. Surprisingly, the transfer is less sensitive to the data condition, where multilingual DocNMT delivers decent performance with either back-translated or genuine document pairs.
- Research Article
- 10.31449/inf.v49i23.10379
- Dec 18, 2025
- Informatica
Document-level neural machine translation (NMT) aims to improve translation coherence by modeling cross-sentence dependencies. However, existing models like the sentence-level Transformer and G-Transformer struggle with capturing global context and produce noisy attention distributions. This paper introduces a novel document-level NMT framework that integrates multi-scale wavelet feature fusion, Balanced Contextual Slicing, a G-Meshed Transformer decoder, and an attention alignment mechanism. The method enhances encoder input using wavelet-derived semantic features, while parity resolution splits documents into overlapping segments to provide richer context without increasing parameters. A mesh structure in the decoder improves feature sharing and weighting across sentences. An attention alignment module further guides the model to focus on semantically relevant context using a lightweight context detector. Experiments on three English-German datasets (TED, News, Europarl) show that our model consistently outperforms strong baselines. In the two-stage training setup, it improves BLEU scores by +0.68 on TED, +0.81 on News, and +1.34 on Europarl over the sentence-level Transformer (average +0.95). With mBART-25 pretraining, it still gains +0.60 BLEU on average over the G-Transformer baseline. The results confirm that our approach significantly improves translation consistency, attention concentration, and handling of discourse phenomena such as deixis and ellipsis.These results highlight the effectiveness of our framework in enhancing document-level translation consistency, contextual representation, and discourse-level coherence.
- Conference Article
3
- 10.18653/v1/2021.eacl-srw.19
- Jan 1, 2021
State-of-the-art (SOTA) neural machine translation (NMT) systems translate texts at sentence level, ignoring context: intra-textual information, like the previous sentence, and extra-textual information, like the gender of the speaker. As a result, some sentences are translated incorrectly. Personalised NMT (PersNMT) and document-level NMT (DocNMT) incorporate this information into the translation process. Both fields are relatively new and previous work within them is limited. Moreover, there are no readily available robust evaluation metrics for them, which makes it difficult to develop better systems, as well as track global progress and compare different methods. This thesis proposal focuses on PersNMT and DocNMT for the domain of dialogue extracted from TV subtitles in five languages: English, Brazilian Portuguese, German, French and Polish. Three main challenges are addressed: (1) incorporating extra-textual information directly into NMT systems; (2) improving the machine translation of cohesion devices; (3) reliable evaluation for PersNMT and DocNMT.
- Conference Article
3
- 10.24963/ijcai.2022/566
- Jul 1, 2022
Document-level neural machine translation (DocNMT) universally encodes several local sentences or the entire document. Thus, DocNMT does not consider the relevance of document-level contextual information, for example, some context (i.e., content words, logical order, and co-occurrence relation) is more effective than another auxiliary context (i.e., functional and auxiliary words). To address this issue, we first utilize the word frequency information to recognize content words in the input document, and then use heuristical relations to summarize content words and sentences as a graph structure without relying on external syntactic knowledge. Furthermore, we apply graph attention networks to this graph structure to learn its feature representation, which allows DocNMT to more effectively capture the document-level context. Experimental results on several widely-used document-level benchmarks demonstrated the effectiveness of the proposed approach.
- Research Article
2
- 10.3390/info13050249
- May 12, 2022
- Information
Current state-of-the-art neural machine translation (NMT) architectures usually do not take document-level context into account. However, the document-level context of a source sentence to be translated could encode valuable information to guide the MT model to generate a better translation. In recent times, MT researchers have turned their focus to this line of MT research. As an example, hierarchical attention network (HAN) models use document-level context for translation prediction. In this work, we studied translations produced by the HAN-based MT systems. We examined how contextual information improves translation in document-level NMT. More specifically, we investigated why context-aware models such as HAN perform better than vanilla baseline NMT systems that do not take context into account. We considered Hindi-to-English, Spanish-to-English and Chinese-to-English for our investigation. We experimented with the formation of conditional context (i.e., neighbouring sentences) of the source sentences to be translated in HAN to predict their target translations. Interestingly, we observed that the quality of the target translations of specific source sentences highly relates to the context in which the source sentences appear. Based on their sensitivity to context, we classify our test set sentences into three categories, i.e., context-sensitive, context-insensitive and normal. We believe that this categorization may change the way in which context is utilized in document-level translation.
- Conference Article
9
- 10.18653/v1/2020.coling-main.388
- Jan 1, 2020
Research on document-level Neural Machine Translation (NMT) models has attracted increasing attention in recent years. Although the proposed works have proved that the inter-sentence information is helpful for improving the performance of the NMT models, what information should be regarded as context remains ambiguous. To solve this problem, we proposed a novel cache-based document-level NMT model which conducts dynamic caching guided by theme-rheme information. The experiments on NIST evaluation sets demonstrate that our proposed model achieves substantial improvements over the state-of-the-art baseline NMT models. As far as we know, we are the first to introduce theme-rheme theory into the field of machine translation.
- Conference Article
30
- 10.18653/v1/2020.autosimtrans-1.5
- Jan 1, 2020
Recently, document-level neural machine translation (NMT) has become a hot topic in the community of machine translation. Despite its success, most of existing studies ignored the discourse structure information of the input document to be translated, which has shown effective in other tasks. In this paper, we propose to improve document-level NMT with the aid of discourse structure information. Our encoder is based on a hierarchical attention network (HAN) (Miculicich et al., 2018). Specifically, we first parse the input document to obtain its discourse structure. Then, we introduce a Transformer-based path encoder to embed the discourse structure information of each word. Finally, we combine the discourse structure information with the word embedding before it is fed into the encoder. Experimental results on the English-to-German dataset show that our model can significantly outperform both Transformer and Transformer+HAN.
- Conference Article
66
- 10.18653/v1/2020.acl-main.322
- Jan 1, 2020
In encoder-decoder neural models, multiple encoders are in general used to represent the contextual information in addition to the individual sentence. In this paper, we investigate multi-encoder approaches in document-level neural machine translation (NMT). Surprisingly, we find that the context encoder does not only encode the surrounding sentences but also behaves as a noise generator. This makes us rethink the real benefits of multi-encoder in context-aware translation - some of the improvements come from robust training. We compare several methods that introduce noise and/or well-tuned dropout setup into the training of these encoders. Experimental results show that noisy training plays an important role in multi-encoder-based NMT, especially when the training data is small. Also, we establish a new state-of-the-art on IWSLT Fr-En task by careful use of noise generation and dropout methods.
- Conference Article
2
- 10.1109/ijcnn52387.2021.9534177
- Jul 18, 2021
By using document-level contextual information, document-level neural machine translation can achieve better results than ordinary machine translation, but traditional document-level machine translation is difficult to focus on the contextual sentence articulation relations and deep positional relations within the discourse while utilizing document-level vocabulary, and the model can concentrate only on relatively shallow inter-sentential relations or positional information. In this paper, we consider that most adjacent sentences are connected in document translation, and such links help improve the quality of translation. We propose a document translation model that focuses more on inter-sentential relations based on the previous work, and propose two methods to strengthen the model's positional information input, and combine these two methods to enhance the traditional Transformer positional information input. This paper also proposes a method for inserting paragraph information to allow inter-sentential relations to be learned by the model, and uses the improved Transformer model for Chinese-Mongolian document translation. Experiments show that in the improved Transformer system, the BLEU scores are enhanced on the Chinese-Mongolian machine translation task after fusing positional information and inter-sentential relation information, and the translation achieves better performance.
- Conference Article
30
- 10.18653/v1/2020.emnlp-main.81
- Jan 1, 2020
Many document-level neural machine translation (NMT) systems have explored the utility of context-aware architecture, usually requiring an increasing number of parameters and computational complexity. However, few attention is paid to the baseline model. In this paper, we research extensively the pros and cons of the standard transformer in document-level translation, and find that the auto-regressive property can simultaneously bring both the advantage of the consistency and the disadvantage of error accumulation. Therefore, we propose a surprisingly simple long-short term masking self-attention on top of the standard transformer to both effectively capture the long-range dependence and reduce the propagation of errors. We examine our approach on the two publicly available document-level datasets. We can achieve a strong result in BLEU and capture discourse phenomena.
- Video Transcripts
- 10.48448/3690-5p45
- May 25, 2021
- Underline Science Inc.
Document-level neural machine translation (NMT) has proven to be of profound value for its effectiveness on capturing contextual information. Nevertheless, existing approaches 1) simply introduce the representations of context sentences without explicitly characterizing the inter-sentence reasoning process; and 2) feed ground-truth target contexts as extra inputs at the training time, thus facing the problem of exposure bias. To this end, we propose a novel Multi-Hop Transformer (MHT) which offers NMT abilities to explicitly model the human-like draft-editing and reasoning process. Specifically, our model serves the sentence-level translation as a draft and properly refines its representations by attending to multiple antecedent sentences iteratively. Experiments on four widely used document translation tasks demonstrate that our method can significantly improve document-level translation performance.
- Research Article
4
- 10.1145/3526215
- Nov 12, 2022
- ACM Transactions on Asian and Low-Resource Language Information Processing
How to effectively model global context has been a critical challenge for document-level neural machine translation (NMT). Both preceding and global context have been carefully explored in the sequence-to-sequence (seq2seq) framework. However, previous studies generally map global context into one vector, which is not enough to well represent the entire document since this largely ignores the hierarchy between sentences and words within. In this article, we propose to model global context for source language from both sentence level and word level. Specifically at sentence level, we extract useful global context for the current sentence, while at word level, we compute global context against words within the current sentence. On this basis, both kinds of global context can be appropriately fused before being incorporated into the state-of-the-art seq2seq model, i.e., Transformer . Detailed experimentation on various document-level translation tasks shows that global context at both sentence level and word level significantly improve translation performance. More encouraging, both kinds of global context are complementary. This leads to more improvement when both kinds of global context are used.
- Conference Article
65
- 10.18653/v1/2020.emnlp-main.175
- Jan 1, 2020
Document-level neural machine translation has yielded attractive improvements. However, majority of existing methods roughly use all context sentences in a fixed scope. They neglect the fact that different source sentences need different sizes of context. To address this problem, we propose an effective approach to select dynamic context so that the document-level translation model can utilize the more useful selected context sentences to produce better translations. Specifically, we introduce a selection module that is independent of the translation module to score each candidate context sentence. Then, we propose two strategies to explicitly select a variable number of context sentences and feed them into the translation module. We train the two modules end-to-end via reinforcement learning. A novel reward is proposed to encourage the selection and utilization of dynamic context sentences. Experiments demonstrate that our approach can select adaptive context sentences for different source sentences, and significantly improves the performance of document-level translation methods.
- Conference Article
152
- 10.18653/v1/w19-5321
- Jan 1, 2019
This paper describes the Microsoft Translator submissions to the WMT19 news translation shared task for English-German. Our main focus is document-level neural machine translation with deep transformer models. We start with strong sentence-level baselines, trained on large-scale data created via data-filtering and noisy back-translation and find that back-translation seems to mainly help with translationese input. We explore fine-tuning techniques, deeper models and different ensembling strategies to counter these effects. Using document boundaries present in the authentic and synthetic parallel data, we create sequences of up to 1000 subword segments and train transformer translation models. We experiment with data augmentation techniques for the smaller authentic data with document-boundaries and for larger authentic data without boundaries. We further explore multi-task training for the incorporation of document-level source language monolingual data via the BERT-objective on the encoder and two-pass decoding for combinations of sentence-level and document-level systems. Based on preliminary human evaluation results, evaluators strongly prefer the document-level systems over our comparable sentence-level system. The document-level systems also seem to score higher than the human references in source-based direct assessment.
- Video Transcripts
- 10.48448/m09q-k780
- May 11, 2022
- Underline Science Inc.
Previous studies on document-level neural machine translation (DNMT) should be reviewed with the doubt of overfitting. We propose to solve DNMT in a document-to-document way with the original Transformer. It turns out that translating the whole document (even more than 2000 words) is not only feasible but also better.