Improving Transformer-Based Neural Machine Translation with Prior Alignments

Thien Nguyen,Phuoc Tran,Huu Nguyen,Lam Nguyen,Dr Shahzad Sarfraz

doi:10.1155/2021/5515407

Abstract

Transformer is a neural machine translation model which revolutionizes machine translation. Compared with traditional statistical machine translation models and other neural machine translation models, the recently proposed transformer model radically and fundamentally changes machine translation with its self-attention and cross-attention mechanisms. These mechanisms effectively model token alignments between source and target sentences. It has been reported that the transformer model provides accurate posterior alignments. In this work, we empirically prove the reverse effect, showing that prior alignments help transformer models produce better translations. Experiment results on Vietnamese-English news translation task show not only the positive effect of manually annotated alignments on transformer models but also the surprising outperformance of statistically constructed alignments reinforced with the flexibility of token-type selection over manual alignments in improving transformer models. Statistically constructed word-to-lemma alignments are used to train a word-to-word transformer model. The novel hybrid transformer model improves the baseline transformer model and transformer model trained with manual alignments by 2.53 and 0.79 BLEU, respectively. In addition to BLEU score, we make limited human judgment on translation results. Strong correlation between human and machine judgment confirms our findings.

Highlights

Transformer is a neural machine translation model which revolutionizes machine translation
Chen et al [4] proposed the use of prior alignments to guide neural machine translation (NMT) models. eir experiments with recurrent NMT models in translating from German to English and from English to French reveal large gains in translation quality of recurrent NMT models trained with prior alignments
Garg et al [5] proposed an adjustment to the state-of-the-art transformer NMT model [6, 7], making the model capable of learning statistical prior alignments. eir experiments for the three language pairs German-English, Romanian-English, and English-French exhibit that the adjusted transformer model consistently produces better posterior alignments, compared with the baseline transformer model

Summary

Related Works

We briefly review works related to our study on improving transformer-based neural machine translation with prior alignments. Motivated by the improvement in translation quality of recurrent NMT models trained with prior alignment [4], we experiment training transformer models with manually constructed alignments (transformer-M) for our Vietnamese-English translation task. TransformerS models employ different token types and are trained on statistically constructed prior alignments instead of. While utilizing the same architecture, training procedure, and procedure to construct statistical alignments (Algorithm 1) of Transformer-S1 model, we tokenize the sentences differently in the transformer-S2 model. On the Vietnamese source side, we segment sentences into words, such as in the transformer-S2 model, while on the English target side, we choose to divide sentences into words, such as in the transformer-S1 model. In preparing prior alignments AS3, we revise the procedure to construct statistical alignments (Algorithm 1), replacing English words with their lemmas.

Materials

Findings

Experiments and Discussion

Conclusions