Abstract

Automatic post-editing (APE) research considers methods for correcting translation results inferred by machine translation systems. The training of APE models, generally require triplets including a source sentence ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$src$ </tex-math></inline-formula> ), machine translation sentence ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$mt$ </tex-math></inline-formula> ), and post-edited sentence ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$pe$ </tex-math></inline-formula> ). As considerable expert-level human labor is required in creating <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$pe$ </tex-math></inline-formula> , APE researches have encountered difficulty in constructing suitable dataset for most of language pairs. This has led to the absence of APE data for most of language pairs, such as Korean-English, and imposed limitation to the sustainable researches of APE. Motivated by this problem, we propose a method that can generate APE triplets using only a parallel corpus without human labor. Our proposal comprises three noise generation techniques, including random, part of speech tagging (POS) based, and semantic level noises, and the effectiveness of these methods are verified by the results of quantitative and qualitative experiments on Korean-English APE tasks. As a result of our experiments, we find that POS based noise encourages the best APE performance. The proposed method is influential in that it can obviate expert human labor which was generally required in APE data construction, and enable the sustainable APE researches for the most language pairs where human-edited APE triplets are unavailable.

Highlights

  • A Utomatic post-editing (APE) is a sub-field of machine translation focusing on the automated correction of errors produced by machine translation systems

  • As substantial human revisions are essential in correcting errors in mt, considerable expertlevel human labor is required in APE data generation

  • DATASET DETAILS To verify the effectiveness of our proposed approach, we adopted a Korean-English parallel corpus, released by AIhub3 [23]

Read more

Summary

Introduction

A Utomatic post-editing (APE) is a sub-field of machine translation focusing on the automated correction of errors produced by machine translation systems. APE has attracted considerable attention in that it alleviates the need for human efforts to correct machine-generated translations to human levels [1] and can contribute to domain specialized translations [2, 3]. APE is currently being actively studied as a shared task in the Conference on Machine Translation (WMT) [4]. APE models require triplet data including a source sentence (src), a corresponding machine translation sentence (mt), and an associated post-edited sentence (pe), which is directly postprocessed by human experts. As substantial human revisions are essential in correcting errors in mt, considerable expertlevel human labor is required in APE data generation

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.