Abstract

Neural machine translation (NMT) is one of the text generation tasks which has achieved significant improvement with the rise of deep neural networks. However, language-specific problems such as handling the translation of honorifics received little attention. In this paper, we propose a context-aware NMT to promote translation improvements of Korean honorifics. By exploiting the information such as the relationship between speakers from the surrounding sentences, our proposed model effectively manages the use of honorific expressions. Specifically, we utilize a novel encoder architecture that can represent the contextual information of the given input sentences. Furthermore, a context-aware post-editing (CAPE) technique is adopted to refine a set of inconsistent sentence-level honorific translations. To demonstrate the efficacy of the proposed method, honorific-labeled test data is required. Thus, we also design a heuristic that labels Korean sentences to distinguish between honorific and non-honorific styles. Experimental results show that our proposed method outperforms sentence-level NMT baselines both in overall translation quality and honorific translations.

Highlights

  • Neural machine translation (NMT) has shown impressive results on translation quality, due to the availability of vast parallel corpus [1], and the introduction of novel deep neural network (DNN) architectures such as encoder-decoder model [2,3], and self-attention based networks [4]

  • We introduce a context-aware NMT to incorporate the context for improving Korean honorific translation

  • We show that the NMT model with contextual encoder outperforms the sentence-level model even when the model is explicitly controlled to translate to a specific honorific style

Read more

Summary

Introduction

Neural machine translation (NMT) has shown impressive results on translation quality, due to the availability of vast parallel corpus [1], and the introduction of novel deep neural network (DNN) architectures such as encoder-decoder model [2,3], and self-attention based networks [4]. Despite the significant improvement over the previous machine translation (MT) systems, NMT still suffers from language-specific problems such as Russian pronoun resolution [6] and honorifics. Addressing such language-specific problems is crucial in both personal and business communications [7] because the preservation of meaning is necessary and many of these language-specific problems are closely related to their culture. Honorifics are good example of these language-specific problems that conveys respect to the audience. In some languages including Korean, Japanese, and Hindi that use honorifics frequently, speaking the right honorifics is considered imperative in those languages

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call