Abstract

Usage of discourse connectives (DCs) differs across languages, thus addition and omission of connectives are common in translation. We investigate how implicit (omitted) DCs in the source text impacts various machine translation (MT) systems, and whether a discourse parser is needed as a preprocessor to explicitate implicit DCs. Based on the manual annotation and alignment of 7266 pairs of discourse relations in a Chinese-English translation corpus, we evaluate whether a preprocessing step that inserts explicit DCs at positions of implicit relations can improve MT. Results show that, without modifying the translation model, explicitating implicit relations in the input source text has limited effect on MT evaluation scores. In addition, translation spotting analysis shows that it is crucial to identify DCs that should be explicitly translated in order to improve implicit-to-explicit DC translation. On the other hand, further analysis reveals that the disambiguation as well as explicitation of implicit relations are subject to a certain level of optionality, suggesting the limitation to learn and evaluate this linguistic phenomenon using standard parallel corpora.

Highlights

  • Discourse relations are semantic and pragmatic relations between clauses or sentences

  • Explicit and implicit discourse connectives’ (DCs) account for 45% and 40% of the DCs annotated in the Penn Discourse Treebank (PDTB) (Prasad et al, 2008) respectively, while in the Chinese Discourse Treebank (CDTB), they account for 22% and 76% respectively (Zhou and Xue, 2015)

  • We investigate how implicit DCs are translated in a translation corpus, and if explicitating implicit DCs in the source can improve machine translation (MT)

Read more

Summary

Introduction

Discourse relations are semantic and pragmatic relations between clauses or sentences. The relations can be explicitly expressed by surface words known as explicit ‘discourse connectives’ (DCs) or implicitly inferred. Chinese discourse units are typically clauses separated by commas, so DCs are often implicit. Comparing with other language pairs, such as Arabic and English, it is found that discourse factors impact machine translation quality more in Chinese-to-English translation, especially when translating discourse relations that are expressed implicitly in one language but explicitly in the other (Li et al, 2014). When translating from Chinese to English, implicit DCs are explicitated when necessary. A causal relation can be inferred between the 2 clauses of the Chinese sentence below. An open question in discourse for SMT is how best to handle cases where DCs are implicit in the source (e.g. Chinese) but explicit in the target (e.g. English). We investigate how implicit DCs are translated in a translation corpus, and if explicitating implicit DCs in the source can improve MT

Related Work
Crosslingual manual alignment of DCs
Annotation scheme
How many DCs are identified?
Explicitating implicit DCs for MT based on manual annotation
Method
MT Settings
Result
Analysis
Is the translation of implicit-to-explicit DCs improved?
Which senses are more common in implicit-to-explicit aligments?
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.