Translationese as a Language in “Multilingual” NMT

Parker Riley,Markus Freitag,David Grangier,Isaac Caswell

doi:10.18653/v1/2020.acl-main.691

Abstract

Machine translation has an undesirable propensity to produce “translationese” artifacts, which can lead to higher BLEU scores while being liked less by human raters. Motivated by this, we model translationese and original (i.e. natural) text as separate languages in a multilingual model, and pose the question: can we perform zero-shot translation between original source text and original target text? There is no data with original source and original target, so we train a sentence-level classifier to distinguish translationese from original target text, and use this classifier to tag the training data for an NMT model. Using this technique we bias the model to produce more natural outputs at test time, yielding gains in human evaluation scores on both accuracy and fluency. Additionally, we demonstrate that it is possible to bias the model to produce translationese and game the BLEU score, increasing it while decreasing human-rated quality. We analyze these outputs using metrics measuring the degree of translationese, and present an analysis of the volatility of heuristic-based train-data tagging.

Highlights

Introduction“Translationese” is a term that refers to artifacts present in text that was translated into a given language that distinguish it from text originally written in that language (Gellerstam, 1986)
Machine translation has an undesirable propensity to produce “translationese” artifacts, which can lead to higher BLEU scores while being liked less by human raters
While target-original test data does have the downside of a translationese source side, recent work has shown that human raters prefer machine translation (MT) output that is closer in distribution to original target text than Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7737–7746 July 5 - 10, 2020. c 2020 Association for Computational Linguistics translationese (Freitag et al, 2019)

Summary

Introduction

“Translationese” is a term that refers to artifacts present in text that was translated into a given language that distinguish it from text originally written in that language (Gellerstam, 1986). Because most MT training sets do not annotate each sentence pair’s original language, we train a binary classifier to predict whether the target side of a pair is original text in that language or translated from the source language This follows several prior works attempting to identify translations (Kurokawa et al, 2009; Koppel and Ordan, 2011; Lembersky et al, 2012). We hope that the noise introduced by round-trip translation will be similar enough to human translationese to be useful for our downstream task In both settings, we use the trained binary classifier to detect and tag training bitext pairs where the classifier predicted that the target side is original

Experimental Set-up

Architecture and Training

Evaluation

Classifier Accuracy

NMT with Translationese-Classified Bitext

Human Evaluation Experiments

Length Variety

Measuring Translationese

Lexical Density

Tagging using Translationese Heuristics

Length Ratio Tagging

Results

Back-Translation Experiments

Example Output

Translationese

Training Data Tagging for NMT

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Translationese as a Language in “Multilingual” NMT

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 36	License type: cc-by

Similar Papers

“On” est ailleurs: renoncer à la traduction parfaite
Guy Rooryck
Neophilologus | VOL. 92
Guy RooryckGuy Rooryck
08 Sep 2007
Neophilologus | VOL. 92

Translation and Validation of Patient Satisfaction with Pharmacist Services Questionnaire (PSPSQ 2.0)
Mohamed Azmi Hassali ... Weng Yan Choy
Journal of Young Pharmacists | VOL. 10
Mohamed Azmi Hassali, et. al.Mohamed Azmi Hassali ... Weng Yan Choy
12 Oct 2018
Journal of Young Pharmacists | VOL. 10

The back-translation score
Reinhard Rapp
-
Reinhard RappReinhard Rapp
01 Jan 2009
01 Jan 2009

Prediction of Arabic Legal Rulings Using Large Language Models
Adel Ammar ... Bilel Benjdira
Electronics | VOL. 13
Adel Ammar, et. al.Adel Ammar ... Bilel Benjdira
15 Feb 2024
Electronics | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Translationese as a Language in “Multilingual” NMT

Abstract

Highlights

Summary

Talk to us

Similar Papers