Abstract

The quality of data-driven Machine Translation (MT) strongly depends on the quantity as well as the quality of the training dataset. However, collecting a large set of training parallel texts is not easy in practice. Although various approaches have already been proposed to overcome this issue, the lack of large parallel corpora still poses a major practical problem for many language pairs. Since monolingual data plays an important role in boosting fluency for Neural MT (NMT) models, this paper investigates and compares the performance of two learning-based translation approaches for Spanish-Turkish translation as a low-resource setting in case we only have access to large sets of monolingual data in each language; 1) Unsupervised Learning approach, and 2) Round-Tripping approach. Either approach completely removes the need for bilingual data or enables us to train the NMT system relying on monolingual data only. We utilize an Attention-based NMT (Attentional NMT) model, which leverages a careful initialization of the parameters, the denoising effect of language models, and the automatic generation of bilingual data. Our experimental results demonstrate that the Unsupervised Learning approach outperforms the Round-Tripping approach in Spanish-Turkish translation and vice versa. These results confirm that the Unsupervised Learning approach is still a reliable learning-based translation technique for Spanish-Turkish low-resource NMT.

Highlights

  • Learning-based translation with monolingual data is an undesirable task due to multiple possible outcomes in the mapping of source language and target language sentences [1]

  • Neural Machine Translation (NMT) applications are used in this translation task by employing Unsupervised Learning and Round-Tripping to handle some of the issues in this alignment and translation task, though experimentally investigating the language pair is not the salient contribution of this research, which can be attributed to the investigation of two learning approaches in low-resource NMT tasks

  • Our results demonstrate that the Unsupervised Learning approach outperforms the Round-Tripping approach in Spanish-Turkish translation and vice versa that confirms that the unsupervised approach is still a reliable learning-based translation technique for Spanish-Turkish low-resource NMT

Read more

Summary

Introduction

Learning-based translation with monolingual data is an undesirable task due to multiple possible outcomes in the mapping of source language and target language sentences [1]. Tianyi Xu et al.: Spanish-Turkish Low-Resource Machine Translation: Unsupervised Learning vs Round-Tripping. The final common component of these system algorithms is that they create a supervised problem from the unsupervised one through the generated pseudo-bilingual sentence pairs to constrain the latent representations produced These representations are to be shared across both the source and target languages. Investigating the effectiveness of the mentioned learning-based approaches, unsupervised learning, and round-tripping, on overall translation quality in the lowresource conditions over the attention-based Neural MT (Attentional NMT) model can be done by employing the low-resource language pair Spanish-Turkish. NMT applications are used in this translation task by employing Unsupervised Learning and Round-Tripping to handle some of the issues in this alignment and translation task, though experimentally investigating the language pair is not the salient contribution of this research, which can be attributed to the investigation of two learning approaches in low-resource NMT tasks.

Related Work
Mathematical Background
Methodology
Experimental Framework
Evaluation
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call