Low-resource Machine Translation Research Articles

Low-resource languages often face the problem of insufficient data, which leads to poor quality in machine translation. One approach to address this issue is data augmentation. Data augmentation involves creating new data by transforming existing data through methods such as flipping, cropping, rotating, and adding noise. Traditionally, pseudo-parallel corpora are generated by randomly replacing words in low-resource language machine translation. However, this method can introduce ambiguity, as the same word may have different meanings in different contexts. This study proposes a new approach for low-resource language machine translation, which involves generating pseudo-parallel corpora by replacing phrases. The performance of this approach is compared with other data augmentation methods, and it is observed that combining it with other data augmentation methods further improves performance. To enhance the robustness of the model, R-Drop regularization is also used. R-Drop is an effective method for improving the quality of machine translation. The proposed method was tested on Chinese–Kazakh (Arabic script) translation tasks, resulting in performance improvements of 4.99 and 7.7 for Chinese-to-Kazakh and Kazakh-to-Chinese translations, respectively. By combining the generation of pseudo-parallel corpora through phrase replacement with the application of R-Drop regularization, there is a significant advancement in machine translation performance for low-resource languages.

Read full abstract

English is accepted as an academic language in the world. This necessitates the use of English in their academic studies for speakers of other languages. Even when these researchers are competent in the use of the English language, some mistakes may occur while writing an academic article. To solve this problem, academicians tend to use automatic translation programs or get assistance from people with an advanced level of English. This study offers an expert system to enable assistance to the researchers throughout their academic article writing process. In this study, Turkish which is considered among low-resource languages is used as the source language. The proposed model combines the transformer encoder-decoder architecture model with the pre-trained Sci-BERT language model via the shallow fusion method. The model uses a Fully Attentional Network Layer instead of a Feed-Forward Network Layer in the known shallow fusion method. In this way, a higher success rate could be achieved by increasing the attention at the word level. Different metrics were used to evaluate the created model. The model created as a result of the experiments reached 45.1 BLEU and 73.2 METEOR scores. In addition, the proposed model achieved 20.12 and 20.56 scores, respectively, with the zero-shot translation method in the World Machine Translation (2017–2018) test datasets. The proposed method could inspire other low-resource languages to include the language model in the translation system. In this study, a corpus composed entirely of academic sentences is also introduced to be used in the translation system. The corpus consists of 1.2 million parallel sentences. The proposed model and corpus are made available to researchers on our GitHub page.

Read full abstract

Low-resource Machine Translation Research Articles

Related Topics

Articles published on Low-resource Machine Translation

An empirical study of a novel multimodal dataset for low-resource machine translation

Breaking language barriers with ChatGPT: enhancing low-resource machine translation between algerian arabic and MSA

Rethinking the Exploitation of Monolingual Data for Low-Resource Neural Machine Translation

Migration Learning and Multi-View Training for Low-Resource Machine Translation

A Chinese–Kazakh Translation Method That Combines Data Augmentation and R-Drop Regularization

Exploration of low-resource language-oriented machine translation system of genetic algorithm-optimized hyper-task network under cloud platform technology

Exploiting multiple correlated modalities can enhance low-resource machine translation quality

Theorizing sustainable, low-resource MT in development settings

On the scalability of data augmentation techniques for low-resource machine translation between Chinese and Vietnamese

Textual Augmentation Techniques Applied to Low Resource Machine Translation: Case of Swahili

Attention Link: An Efficient Attention-Based Low Resource Machine Translation Architecture

Fully Attentional Network for Low-Resource Academic Machine Translation and Post Editing

The appeal of green advertisements on consumers' consumption intention based on low-resource machine translation

Survey of Low-Resource Machine Translation

Low resource machine translation of english–manipuri: A semi-supervised approach

Boosting the Transformer with the BERT Supervision in Low-Resource Machine Translation

The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

TLSPG: Transfer learning-based semi-supervised pseudo-corpus generation approach for zero-shot translation

Transfer learning based on lexical constraint mechanism in low-resource machine translation

Linguistically Driven Multi-Task Pre-Training for Low-Resource Neural Machine Translation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Low-resource Machine Translation Research Articles

Related Topics

Articles published on Low-resource Machine Translation

An empirical study of a novel multimodal dataset for low-resource machine translation

Breaking language barriers with ChatGPT: enhancing low-resource machine translation between algerian arabic and MSA

Rethinking the Exploitation of Monolingual Data for Low-Resource Neural Machine Translation

Migration Learning and Multi-View Training for Low-Resource Machine Translation

A Chinese–Kazakh Translation Method That Combines Data Augmentation and R-Drop Regularization

Exploration of low-resource language-oriented machine translation system of genetic algorithm-optimized hyper-task network under cloud platform technology

Exploiting multiple correlated modalities can enhance low-resource machine translation quality

Theorizing sustainable, low-resource MT in development settings

On the scalability of data augmentation techniques for low-resource machine translation between Chinese and Vietnamese

Textual Augmentation Techniques Applied to Low Resource Machine Translation: Case of Swahili

Attention Link: An Efficient Attention-Based Low Resource Machine Translation Architecture

Fully Attentional Network for Low-Resource Academic Machine Translation and Post Editing

The appeal of green advertisements on consumers' consumption intention based on low-resource machine translation

Survey of Low-Resource Machine Translation

Low resource machine translation of english–manipuri: A semi-supervised approach

Boosting the Transformer with the BERT Supervision in Low-Resource Machine Translation

The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

TLSPG: Transfer learning-based semi-supervised pseudo-corpus generation approach for zero-shot translation

Transfer learning based on lexical constraint mechanism in low-resource machine translation

Linguistically Driven Multi-Task Pre-Training for Low-Resource Neural Machine Translation