Transformer-based Neural Machine Translation Research Articles

In a multilingual country like India, automatic natural language translation plays a key role in building a community with different linguistic people. Many researchers have explored and improved the translation process for high-resource languages such as English, German, etc., and achieved state-of-the-art results. However, the unavailability of adequate data is the prime obstacle to automatic natural language translation of low-resource north-eastern Indian languages such as Mizo, Khasi, and Assamese. Though the recent past has witnessed a deluge in several automatic natural language translation systems for low-resource languages, the low values of their evaluation measures indicate the scope for improvement. In the recent past, the neural machine translation approach has significantly improved translation quality, and the credit goes to the availability of a huge amount of data. Subsequently, the neural machine translation approach for low-resource language is underrepresented due to the unavailability of adequate data. In this work, we have considered a low-resource English–Assamese pair using the transformer-based neural machine translation, which leverages the use of prior alignment and a pre-trained language model. To extract alignment information from the source–target sentences, we have used the pre-trained multilingual contextual embeddings-based alignment technique. Also, the transformer-based language model is built using monolingual target sentences. With the use of both prior alignment and a pre-trained language model, the transformer-based neural machine translation model shows improvement, and we have achieved state-of-the-art results for the English-to-Assamese and Assamese-to-English translation, respectively.

Read full abstract

Integrating linguistic features has been widely utilized in statistical machine translation (SMT) systems, resulting in improved translation quality. However, for low-resource languages such as Thai and Myanmar, the integration of linguistic features in neural machine translation (NMT) systems has yet to be implemented. In this study, we propose transformer-based NMT models (transformer, multi-source transformer, and shared-multi-source transformer models) using linguistic features for two-way translation of Thai-to-Myanmar, Myanmar-to-English, and Thai-to-English. Linguistic features such as part-of-speech (POS) tags or universal part-of-speech (UPOS) tags are added to each word on either the source or target side, or both the source and target sides, and the proposed models are conducted. The multi-source transformer and shared-multi-source transformer models take two inputs (i.e., string data and string data with POS tags) and produce string data or string data with POS tags. A transformer model that utilizes only word vectors was used as the first baseline model for comparison with the proposed models. The second baseline model, an Edit-Based Transformer with Repositioning (EDITOR) model, was also used to compare with our proposed models in addition to the baseline transformer model. The findings of the experiments show that adding linguistic features to the transformer-based models enhances the performance of a neural machine translation in low-resource language pairs. Moreover, the best translation results were yielded using shared-multi-source transformer models with linguistic features resulting in more significant Bilingual Evaluation Understudy (BLEU) scores and character n-gram F-score (chrF) scores than the baseline transformer and EDITOR models.

Read full abstract

Transformer-based Neural Machine Translation Research Articles

Related Topics

Articles published on Transformer-based Neural Machine Translation

On compositional generalization of transformer-based neural machine translation

RETRACTED: Improving neural machine translation by word translations

Addressing data scarcity issue for English–Mizo neural machine translation using data augmentation and language model

Boosting English-Amharic machine translation using corpus augmentation and Transformer

Transformer: A General Framework from Machine Translation to Others

English–Assamese neural machine translation using prior alignment and pre-trained language model

On the scalability of data augmentation techniques for low-resource machine translation between Chinese and Vietnamese

Transformer-Based Neural Network Machine Translation Model for the Kurdish Sorani Dialect

Pipeline Signed Japanese Translation Using PBSMT and Transformer in a Low-Resource Setting

A Domain Specific Parallel Corpus and Enhanced English-Assamese Neural Machine Translation

Coarse-to-Fine Output Predictions for Efficient Decoding in Neural Machine Translation

Improving neural machine translation with POS-tag features for low-resource language pairs

Grammatically Derived Factual Relation Augmented Neural Machine Translation

Human Evaluation of English–Irish Transformer-Based NMT

Enhancing low-resource neural machine translation with syntax-graph guided self-attention

A Transformer-Based Neural Machine Translation Model for Arabic Dialects That Utilizes Subword Units.

Modeling Future Cost for Neural Machine Translation

Preordering Encoding on Transformer for Translation

A Review and evaluation of Machine Translation methods for Lumasaaba

A Hierarchical Clustering Approach to Fuzzy Semantic Representation of Rare Words in Neural Machine Translation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Transformer-based Neural Machine Translation Research Articles

Related Topics

Articles published on Transformer-based Neural Machine Translation

On compositional generalization of transformer-based neural machine translation

RETRACTED: Improving neural machine translation by word translations

Addressing data scarcity issue for English–Mizo neural machine translation using data augmentation and language model

Boosting English-Amharic machine translation using corpus augmentation and Transformer

Transformer: A General Framework from Machine Translation to Others

English–Assamese neural machine translation using prior alignment and pre-trained language model

On the scalability of data augmentation techniques for low-resource machine translation between Chinese and Vietnamese

Transformer-Based Neural Network Machine Translation Model for the Kurdish Sorani Dialect

Pipeline Signed Japanese Translation Using PBSMT and Transformer in a Low-Resource Setting

A Domain Specific Parallel Corpus and Enhanced English-Assamese Neural Machine Translation

Coarse-to-Fine Output Predictions for Efficient Decoding in Neural Machine Translation

Improving neural machine translation with POS-tag features for low-resource language pairs

Grammatically Derived Factual Relation Augmented Neural Machine Translation

Human Evaluation of English–Irish Transformer-Based NMT

Enhancing low-resource neural machine translation with syntax-graph guided self-attention

A Transformer-Based Neural Machine Translation Model for Arabic Dialects That Utilizes Subword Units.

Modeling Future Cost for Neural Machine Translation

Preordering Encoding on Transformer for Translation

A Review and evaluation of Machine Translation methods for Lumasaaba

A Hierarchical Clustering Approach to Fuzzy Semantic Representation of Rare Words in Neural Machine Translation