Abstract

Neural machine translation systems have revolutionized translation processes in terms of quantity and speed in recent years, and they have even been claimed to achieve human parity. However, the quality of their output has also raised serious doubts and concerns, such as loss in lexical variation, evidence of “machine translationese”, and its effect on post-editing, which results in “post-editese”. In this study, we analyze the outputs of three English to Slovenian machine translation systems in terms of lexical diversity in three different genres. Using both quantitative and qualitative methods, we analyze one statistical and two neural systems, and we compare them to a human reference translation. Our quantitative analyses based on lexical diversity metrics show diverging results; however, translation systems, particularly neural ones, mostly exhibit larger lexical diversity than their human counterparts. Nevertheless, a qualitative method shows that these quantitative results are not always a reliable tool to assess true lexical diversity and that a lot of lexical “creativity”, especially by neural translation systems, is often unreliable, inconsistent, and misguided.

Highlights

  • In the past couple of years, an abundance of automatic systems for translation have emerged, a lot of them available to the general public

  • Questions arise as to what neural systems in particular bring to the table compared to their older statistical counterparts. Are they really more similar to human translations, and do they exhibit more “creativity” in terms of lexical variation?. Do they really better adjust their solutions to the context than phrase-based statistical models? In this study, we focus on translations from English to Slovenian and choose to look at lexical diversity in human vs. various machine translations

  • Our corpus is comprised of an information technology (IT) subcorpus consisting of a printer instruction manual [18,19] and a printer user guide [20,21], a culinary subcorpus (CUL) consisting of a book of recipes [22,23], and a literary subcorpus (LIT) consisting of a popular fiction novel [24,25]

Read more

Summary

Introduction

In the past couple of years, an abundance of automatic systems for translation have emerged, a lot of them available to the general public. The older phrase-based systems have given way to newer, “cleverer” neural machine translation systems that have been considered state-of-the-art for some years. These general translation systems offer translation on-the-go and can supposedly handle a wide range of texts and genres, purportedly excelling at newer contexts and unseen data (out of vocabulary words). Are they considered faster and better, for some well-resourced languages, they have already been claimed to achieve human parity [1]). Researchers have raised serious concerns about machine translation (MT), such as loss of lexical variation in the target text [6,7,8], warning of a potential lexical impoverishment of the target language [9] and the dangers of language learners developing a “warped exposure” to that language through neural machine translation (NMT) [8]

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.