Translation Quality Regarding Low-Resource, Custom Machine Translations: A Fine-Grained Comparative Study on Turkish-to-English Statistical and Neural Machine Translation Systems

Gökhan Doğru

doi:10.26650/iujts.2022.1182687

Abstract

Corpus-based machine translation (MT) has been the main approach to developing and implementing MT systems in both academia and the industry over the last three decades. In this field, the type and size of the corpus used for training MT engines have presented problems for both statistical MT (SMT) systems as well as neural MT (NMT) systems, being the two dominant corpusbased approaches. Moreover, language pairs such as Turkish-English have been understudied within this framework. This article aims to evaluate the translation quality in Turkish-to-English custom MT systems that have been trained on different corpus sizes and types. Two NMT engines and two SMT engines were trained on the KantanMT platform using two different training corpus types with either only domain-specific cardiology corpus or this corpus plus a mixed-domain corpus. The study conducted both automatic evaluations with metrics including BLEU, F-Measure and TER, as well as a comprehensive human evaluation with metrics including fluency, A/B test, and adequacy. Lastly, the study realized a separate, subjective terminology evaluation in order to investigate how differently MT systems handle terminology, as this is a crucial aspect for specific-domain text types such as cardiology. While the automatic evaluation results suggest the SMT engines to perform better than NMT engines, all human evaluators rated the mixed-domain NMT engine as the highest performing one. However, the terminology evaluation task demonstrated SMT to still be able to perform better and to commit less terminology errors, despite the industry and academia shifting toward NMT engines.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Translation Quality Regarding Low-Resource, Custom Machine Translations: A Fine-Grained Comparative Study on Turkish-to-English Statistical and Neural Machine Translation Systems

Abstract

Talk to us

Similar Papers

More From: İstanbul Üniversitesi Çeviribilim Dergisi / Istanbul University Journal of Translation Studies

Lead the way for us

Journal: İstanbul Üniversitesi Çeviribilim Dergisi / Istanbul University Journal of Translation Studies	Publication Date: Dec 29, 2022
License type: cc-by-nc

Similar Papers

Baidu Translate: Research and Products
Zhongjun He
-
Zhongjun HeZhongjun He
01 Jan 2015
01 Jan 2015

Training, Enhancing, Evaluating and Using MT Systems with Comparable Data
Mārcis Pinnis ... Mateja Verlic
-
Mārcis Pinnis, et. al.Mārcis Pinnis ... Mateja Verlic
01 Jan 2019
01 Jan 2019

Low-Resource Multi-Domain Machine Translation for Spanish-Farsi: Neural or Statistical?
Benyamin Ahmadnia ... Bonnie J Dorr
Procedia Computer Science | VOL. 177
Benyamin Ahmadnia, et. al.Benyamin Ahmadnia ... Bonnie J Dorr
01 Jan 2020
Procedia Computer Science | VOL. 177

Human Versus Automatic Evaluation of NMT for Low-Resource Indian Language
Goutam Datta ... Kusum Gupta
-
Goutam Datta, et. al.Goutam Datta ... Kusum Gupta
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Translation Quality Regarding Low-Resource, Custom Machine Translations: A Fine-Grained Comparative Study on Turkish-to-English Statistical and Neural Machine Translation Systems

Abstract

Talk to us

Similar Papers

More From: İstanbul Üniversitesi Çeviribilim Dergisi / Istanbul University Journal of Translation Studies