Abstract
Quality estimation (QE) has recently gained increasing interest as it can predict the quality of machine translation results without a reference translation. QE is an annual shared task at the Conference on Machine Translation (WMT), and most recent studies have applied the multilingual pretrained language model (mPLM) to address this task. Recent studies have focused on the performance improvement of this task using data augmentation with finetuning based on a large-scale mPLM. In this study, we eliminate the effects of data augmentation and conduct a pure performance comparison between various mPLMs. Separate from the recent performance-driven QE research involved in competitions addressing a shared task, we utilize the comparison for sub-tasks from WMT20 and identify an optimal mPLM. Moreover, we demonstrate QE using the multilingual BART model, which has not yet been utilized, and conduct comparative experiments and analyses with cross-lingual language models (XLMs), multilingual BERT, and XLM-RoBERTa.
Highlights
Unlike other previous studies that mostly utilize the SOTA model, we remove the effects of data augmentation that are utilized to achieve performance improvement and perform a comparative study between representative multilingual pretrained language model (mPLM) based on sub-tasks 1 and 2 from WMT20
To the best of our knowledge, we are the first to conduct such research; Through a comparative analysis concerning how to construct an appropriate input structure for Quality estimation (QE), we reveal that the performance can be improved by changing the input order of the source sentence and the machine translation (MT) output; In the process of finetuning mPLMs, we only use data officially distributed in WMT20 and use the official test set to ensure objectivity for all experiments
English–German was used as the language pair for this experiment, and performance comparisons were conducted for each mPLM at sub-task 1 and sub-task 2 sentence-levels
Summary
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Previous studies have used the input structure of [BOS] Source sentence [EOS] [EOS] MT output [EOS] or [BOS] MT output [EOS] [EOS] Source sentence [EOS] without a clear standard We investigate this process through a quantitative analysis by utilizing different input structures for all mPLMs. The contributions of this study are as follows: We conduct comparative experiments on finetuning mPLMs for a QE task, which is different from research concerning the performance improvement of the WMT sharedtask competition. To the best of our knowledge, we are the first to conduct such research; Through a comparative analysis concerning how to construct an appropriate input structure for QE, we reveal that the performance can be improved by changing the input order of the source sentence and the MT output; In the process of finetuning mPLMs, we only use data officially distributed in WMT20. (without external knowledge or data augmentation) and use the official test set to ensure objectivity for all experiments
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.