Abstract
Neural machine translation has recently been able to gain state-of-the-art translation quality for many language pairs. However, neural machine translation has been less tested for English-Bangla language pair, two linguistically distant and widely spoken languages. In this paper, we apply neural machine translation to the task of English-Bangla translation in both directions and compare it against a standard phrase-based statistical machine translation system. We obtain up to +0.30 and +4.95 BLEU improvement over phrase-based statistical machine translation for English-to-Bangla and Bangla-to-English respectively. Due to low-resource and morphological richness of Bangla, English-Bangla translation task produces a large number of rare words. We apply subword segmentation with byte pair encoding to handle this rare words issue. We obtain up to +0.69 and +0.30 BLEU improvement over baseline neural machine translation for English-to-Bangla and Bangla-to-English respectively. We further investigate our system output for several challenging linguistic properties like subject-verb agreement, noun inflection, long distance reordering and rare words translation. We observe that neural machine translation with and without subword segmentation significantly outperform the phrase-based statistical machine translation system, thus establishing itself as the state-of-the-art technology for low-resource English-Bangla machine translation.
Highlights
In this era of globalization, every communication becomes gradually international and multilingual
We evaluate low-resource English-Bangla machine translation for our three different systems: Phrase-based Statistical MT (SMT), baseline attention-based Neural Machine Translation (NMT) and attention-based NMT with Byte Pair Encoding (BPE)
We conjecture that the continuous space representation of words and capturing the long distance context of a text through the attention mechanism make attention-based NMT to retain morphological form and syntactic structure of the target text better, making the translation quality better
Summary
In this era of globalization, every communication becomes gradually international and multilingual. To meet this demand of the globalization, automatic language translation called Machine Translation (MT) has become an attractive area of research. Bangla is the seventh most spoken language all over the world with an estimation of 250 million people in Bangladesh and the Indian subcontinent. As the internet and other communications are predominantly in English, machine translation between English and Bangla languages becomes a much-needed tool to promote this large Bangla spoken community as an active participant of this global world. Previous works on English-Bangla machine translation are mostly limited to conventional machine translation techniques. Neural machine translation for low-resource English-Bangla has not been explored so intensively yet
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.