Evaluating the Impact of Integrating Similar Translations into Neural Machine Translation

Arda Tezcan,Bram Bulté

doi:10.3390/info13010019

Abstract

Previous research has shown that simple methods of augmenting machine translation training data and input sentences with translations of similar sentences (or fuzzy matches), retrieved from a translation memory or bilingual corpus, lead to considerable improvements in translation quality, as assessed by a limited set of automatic evaluation metrics. In this study, we extend this evaluation by calculating a wider range of automated quality metrics that tap into different aspects of translation quality and by performing manual MT error analysis. Moreover, we investigate in more detail how fuzzy matches influence translations and where potential quality improvements could still be made by carrying out a series of quantitative analyses that focus on different characteristics of the retrieved fuzzy matches. The automated evaluation shows that the quality of NFR translations is higher than the NMT baseline in terms of all metrics. However, the manual error analysis did not reveal a difference between the two systems in terms of total number of translation errors; yet, different profiles emerged when considering the types of errors made. Finally, in our analysis of how fuzzy matches influence NFR translations, we identified a number of features that could be used to improve the selection of fuzzy matches for NFR data augmentation.

Highlights

Machine translation (MT) systems are routinely evaluated using a restricted set of automated quality metrics, especially at early stages of development [1,2]
We focus on a simple approach to translation memory (TM)–neural machine translation (NMT) integration, neural fuzzy repair (NFR), that relies on source sentence augmentation through the concatenation of translations of similar source sentences retrieved from a TM [3]
We report the overall score per metric for the baseline and NFR systems, as well as the TM; further, we indicate the absolute and relative difference between the baseline and the NFR systems

Summary

Introduction

Machine translation (MT) systems are routinely evaluated using a restricted set of automated quality metrics, especially at early stages of development [1,2]. Using mainly BLEU [6], a metric quantifying the degree of exact overlap between MT output and a reference translation, substantial quality improvements were demonstrated between NFR systems and strong neural machine translation (NMT) baselines. This difference in terms of BLEU score was, arguably, consistent (across language pairs and data sets) and large enough to be interpreted as a strong indication that NFR can lead to translations of better quality. Our aim is two-fold; do we want to obtain a better picture of the quality of translations produced with NFR, we hope to gain more insight into how NFR leads to better translation quality and to identifying patterns that can be exploited to further improve the system

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluating the Impact of Integrating Similar Translations into Neural Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Journal: Information	Publication Date: Jan 4, 2022
License type: CC BY 4.0

Similar Papers

Iterative Training of Unsupervised Neural and Statistical Machine Translation Systems
Benjamin Marie ... Atsushi Fujita
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19
Benjamin Marie, et. al.Benjamin Marie ... Atsushi Fujita
01 Jun 2020
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19

A Survey on Evaluation Metrics for Machine Translation
Seungjun Lee ... Jaehyung Seo
Mathematics | VOL. 11
Seungjun Lee, et. al.Seungjun Lee ... Jaehyung Seo
16 Feb 2023
Mathematics | VOL. 11

Towards a Better Integration of Fuzzy Matches in Neural Machine Translation through Data Augmentation
Arda Tezcan ... Bram Bulté
Informatics | VOL. 8
Arda Tezcan, et. al.Arda Tezcan ... Bram Bulté
29 Jan 2021
Informatics | VOL. 8

A comparative analysis of lexical-based automatic evaluation metrics for different Indic language pairs
Kiranjeet Kaur ... Shweta Chauhan
Journal of Autonomous Intelligence | VOL. 7
Kiranjeet Kaur, et. al.Kiranjeet Kaur ... Shweta Chauhan
02 Feb 2024
Journal of Autonomous Intelligence | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluating the Impact of Integrating Similar Translations into Neural Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information