Towards Personalised and Document-level Machine Translation of Dialogue

Sebastian Vincent

doi:10.18653/v1/2021.eacl-srw.19

Abstract

State-of-the-art (SOTA) neural machine translation (NMT) systems translate texts at sentence level, ignoring context: intra-textual information, like the previous sentence, and extra-textual information, like the gender of the speaker. As a result, some sentences are translated incorrectly. Personalised NMT (PersNMT) and document-level NMT (DocNMT) incorporate this information into the translation process. Both fields are relatively new and previous work within them is limited. Moreover, there are no readily available robust evaluation metrics for them, which makes it difficult to develop better systems, as well as track global progress and compare different methods. This thesis proposal focuses on PersNMT and DocNMT for the domain of dialogue extracted from TV subtitles in five languages: English, Brazilian Portuguese, German, French and Polish. Three main challenges are addressed: (1) incorporating extra-textual information directly into NMT systems; (2) improving the machine translation of cohesion devices; (3) reliable evaluation for PersNMT and DocNMT.

Highlights

Neural machine translation (NMT) represents stateof-the-art (SOTA) results in many domains (Sutskever et al, 2014; Vaswani et al, 2017; Lample et al, 2020), with some authors claiming human parity (Hassan et al, 2018)
We present the research on Personalised NMT (PersNMT)
Many machine translation evaluation (MTE) metrics have been proposed over the years, much owing to the yearly WMT Metrics task (Mathur et al, 2020)

Summary

Introduction

Neural machine translation (NMT) represents stateof-the-art (SOTA) results in many domains (Sutskever et al, 2014; Vaswani et al, 2017; Lample et al, 2020), with some authors claiming human parity (Hassan et al, 2018). Traditional methods process texts in short units like the utterance or sentence, isolating them from the entire dialogue or document, as well as ignoring extra-textual information (e.g. who is speaking, who they are talking to). This can result in a translation hypothesis’ meaning or function being significantly different from the reference or make the text incohesive or illogical. When translating “I didn’t go.” into Polish, the machine translation (MT) model must guess the gender of I, as this information is not rendered in the English sentence. Previous research on cohesion within DocNMT has revealed that verb phrase ellipsis, coreference and reiteration (a type of lexical cohesion) may be erroneous in MT (e.g. Tiedemann and Scherrer, 2017; Bawden et al, 2018; Voita et al, 2020)

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Towards Personalised and Document-level Machine Translation of Dialogue

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2021
Citations: 1	License type: cc-by

Similar Papers

Baidu Translate: Research and Products
Zhongjun He
-
Zhongjun HeZhongjun He
01 Jan 2015
01 Jan 2015

SBSim: A Sentence-BERT Similarity-Based Evaluation Metric for Indian Language Neural Machine Translation Systems
K Mrinalini ... Nagarajan Thangavelu
IEEE/ACM transactions on audio, speech, and language processing | VOL. 30
K Mrinalini, et. al.K Mrinalini ... Nagarajan Thangavelu
01 Jan 2021
IEEE/ACM transactions on audio, speech, and language processing | VOL. 30

English-Myanmar Supervised and Unsupervised NMT: NICT’s Machine Translation Systems at WAT-2019
Rui Wang ... Masao Utiyama
-
Rui Wang, et. al.Rui Wang ... Masao Utiyama
01 Jan 2019
English-Myanmar Supervised and Unsupervised NMT: NICT’s Machine Translation Systems at WAT-2019
Rui Wang ... Masao Utiyama

Iterative Training of Unsupervised Neural and Statistical Machine Translation Systems
Benjamin Marie ... Atsushi Fujita
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19
Benjamin Marie, et. al.Benjamin Marie ... Atsushi Fujita
01 Jun 2020
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards Personalised and Document-level Machine Translation of Dialogue

Abstract

Highlights

Summary

Talk to us

Similar Papers