DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation

Rachel Bawden,Sophie Rosset,Thomas Lavergne,Eric Bilinski

doi:10.1007/s10579-020-09514-4

Abstract

We present a new English–French dataset for the evaluation of Machine Translation (MT) for informal, written bilingual dialogue. The test set contains 144 spontaneous dialogues (5700+ sentences) between native English and French speakers, mediated by one of two neural MT systems in a range of role-play settings. The dialogues are accompanied by fine-grained sentence-level judgments of MT quality, produced by the dialogue participants themselves, as well as by manually normalised versions and reference translations produced a posteriori. The motivation for the corpus is twofold: to provide (i) a unique resource for evaluating MT models, and (ii) a corpus for the analysis of MT-mediated communication. We provide an initial analysis of the corpus to confirm that the participants’ judgments reveal perceptible differences in MT quality between the two MT systems used.

Highlights

The use of Machine Translation (MT) to translate everyday, written exchanges is becoming increasingly commonplace; translation tools regularly appear on chat applications and social networking sites to enable cross-lingual communication
We present DiaBLa (Dialogue BiLingue ‘Bilingual Dialogue’), a new dataset of English–French spontaneous written dialogues mediated by MT,1 obtained by crowdsourcing, covering a range of dialogue topics and annotated with fine-grained human judgments of MT quality
The choice to use the participants to provide the MT evaluation is an important part of our protocol: we can collect judgments on the fly, facilitating the evaluation process, and it importantly means that the evaluation is performed from the point of view of participants actively engaged in dialogue

Summary

Introduction

The use of Machine Translation (MT) to translate everyday, written exchanges is becoming increasingly commonplace; translation tools regularly appear on chat applications and social networking sites to enable cross-lingual communication. The translation of dialogue requires translating sentences coherently with respect to the conversational flow so that all aspects of the exchange, including speaker intent, attitude and style, are correctly communicated (Bawden 2018). We present DiaBLa (Dialogue BiLingue ‘Bilingual Dialogue’), a new dataset of English–French spontaneous written dialogues mediated by MT, obtained by crowdsourcing, covering a range of dialogue topics and annotated with fine-grained human judgments of MT quality. To our knowledge, this is the first corpus of its kind. The result is a rich bilingual test corpus of 144 dialogues, which are annotated with sentence-level MT quality evaluations and human reference translations

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Language Resources and Evaluation	Publication Date: Nov 23, 2020
Citations: 1	License type: open-access

R Discovery Prime

R Discovery Prime

DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Language Resources and Evaluation

Lead the way for us

Similar Papers

Baidu Translate: Research and Products
Zhongjun He
-
Zhongjun HeZhongjun He
01 Jan 2015
01 Jan 2015

Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
Nghia-Luan Pham ... Van-Vinh Nguyen
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36
Nghia-Luan Pham, et. al.Nghia-Luan Pham ... Van-Vinh Nguyen
30 May 2020
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36

Improving English-to-Indian Language Neural Machine Translation Systems
Akshara Kandimalla ... Pintu Lohar
Information | VOL. 13
Akshara Kandimalla, et. al.Akshara Kandimalla ... Pintu Lohar
11 May 2022
Information | VOL. 13

The interaction effect between source text complexity and machine translation quality on the task difficulty of NMT post-editing from English to Chinese: A multi-method study
Yanfang Jia ... Binghan Zheng
Across Languages and Cultures | VOL. 23
Yanfang Jia, et. al.Yanfang Jia ... Binghan Zheng
09 May 2022
Across Languages and Cultures | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Language Resources and Evaluation