A Scenario-Generic Neural Machine Translation Data Augmentation Method

Xiner Liu,Jianshu He,Wenfeng Zheng,Lirong Yin,Mingzhe Liu,Zhengtong Yin

doi:10.3390/electronics12102320

Abstract

Amid the rapid advancement of neural machine translation, the challenge of data sparsity has been a major obstacle. To address this issue, this study proposes a general data augmentation technique for various scenarios. It examines the predicament of parallel corpora diversity and high quality in both rich- and low-resource settings, and integrates the low-frequency word substitution method and reverse translation approach for complementary benefits. Additionally, this method improves the pseudo-parallel corpus generated by the reverse translation method by substituting low-frequency words and includes a grammar error correction module to reduce grammatical errors in low-resource scenarios. The experimental data are partitioned into rich- and low-resource scenarios at a 10:1 ratio. It verifies the necessity of grammatical error correction for pseudo-corpus in low-resource scenarios. Models and methods are chosen from the backbone network and related literature for comparative experiments. The experimental findings demonstrate that the data augmentation approach proposed in this study is suitable for both rich- and low-resource scenarios and is effective in enhancing the training corpus to improve the performance of translation tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: May 21, 2023
Citations: 47	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Scenario-Generic Neural Machine Translation Data Augmentation Method

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

A Computational Neural Network Model for College English Grammar Correction.
Xingjie Wu
Computational intelligence and neuroscience | VOL. 2022
Xingjie WuXingjie Wu
05 Sep 2022
Computational intelligence and neuroscience | VOL. 2022

Data-driven posterior features for low resource speech recognition applications
Samuel Thomas ... Aren Jansen
-
Samuel Thomas, et. al.Samuel Thomas ... Aren Jansen
09 Sep 2012
09 Sep 2012

Dynamic decoding and dual synthetic data for automatic correction of grammar in low-resource scenario.
Ahmad Musyafa ... Muhammad Faizan Khan
PeerJ. Computer science | VOL. 10
Ahmad Musyafa, et. al.Ahmad Musyafa ... Muhammad Faizan Khan
01 Jan 2024
PeerJ. Computer science | VOL. 10

M-Sim: Multi-level Semantic Inference Model for Chinese short answer scoring in low-resource scenarios
Peichao Lai ... Yilei Wang
Computer Speech & Language | VOL. 84
Peichao Lai, et. al.Peichao Lai ... Yilei Wang
20 Oct 2023
Computer Speech & Language | VOL. 84

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Scenario-Generic Neural Machine Translation Data Augmentation Method

Abstract

Talk to us

Similar Papers

More From: Electronics