On the scalability of data augmentation techniques for low-resource machine translation between Chinese and Vietnamese

Huan Vu,Ngoc Dung Bui

doi:10.1080/24751839.2023.2186625

Abstract

ABSTRACT Neural Machine Translation (NMT) has constantly been shown to be a standard choice to build a translation system, in both academia and industry. For low-resource language pairs, data augmentation techniques have been widely used to tackle the data shortage problem in NMT. In this paper, we investigate the scaling behaviour of transformer-based NMT model to the increasing amount of synthetic data. Through the experiments, conducted in the Chinese-to-Vietnamese translation task, we aim to provide a guideline to the application of several methods such as back-translation, tagged back-translation, self-training and sentence concatenation in a low-resource, less-related language pair. Our results suggest that choosing the appropriate amount of synthetic data is a crucial task when building NMT systems. In addition, when combining methods, it is recommended to tag the data sources before training.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On the scalability of data augmentation techniques for low-resource machine translation between Chinese and Vietnamese

Abstract

Talk to us

Similar Papers

More From: Journal of Information and Telecommunication

Lead the way for us

Journal: Journal of Information and Telecommunication	Publication Date: Mar 21, 2023
License type: open-access

Similar Papers

Iterative Training of Unsupervised Neural and Statistical Machine Translation Systems
Benjamin Marie ... Atsushi Fujita
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19
Benjamin Marie, et. al.Benjamin Marie ... Atsushi Fujita
01 Jun 2020
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19

Multilingual Neural Translation

-

14 Feb 2020
14 Feb 2020

Development of Neural Machine Translator for English-Assamese Language Pair
Basab Nath ... Sunita Sarkar
-
Basab Nath, et. al.Basab Nath ... Sunita Sarkar
03 Aug 2021
03 Aug 2021

A Study of Machine Translation Models for Kannada-Tulu
Asha Hegde ... Bharathi Raja Chakravarthi
-
Asha Hegde, et. al.Asha Hegde ... Bharathi Raja Chakravarthi
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the scalability of data augmentation techniques for low-resource machine translation between Chinese and Vietnamese

Abstract

Talk to us

Similar Papers

More From: Journal of Information and Telecommunication