Neural Machine Translation for Low Resource Assamese–English

Sahinur Rahman Laskar,Sivaji Bandyopadhyay,Partha Pakray

doi:10.1007/978-981-33-4084-8_4

Abstract

Neural machine translation (NMT) attracts attention to the machine translation (MT) community because of its potential to persist sequential data over variable lengths of input and output sentences. With the attention mechanism, the NMT system achieves state-of-the-art technique which allows focusing on specific input vectors of the input sentence instead of learning a single vector for each input sentence. Although, the NMT system improves translation quality by handling long-term dependency issue and facilitate the ability of contextual analysis, and it requires adequate parallel train corpus, which is difficult in the context of low resource languages. This paper contributes an Assamese–English parallel corpus and builds two NMT systems, sequence-to-sequence recurrent neural network (RNN) with attention mechanism (NMT-1) and a transformer model having self-attention mechanism (NMT-2). The NMT-2 system has achieved a higher bilingual evaluation understudy (BLEU) score of 10.03 for English-to-Assamese and BLEU score 13.10 for Assamese-to-English translation, respectively.

Full Text