Abstract

The language barrier is one of the practical challenges human being face during communication. To overcome this, researchers are focusing on using machines to translate a source language to a target language using the textual representations of the languages. Thus, machine translation (MT) could achieve a near human-level performance in terms of translation quality for several resource-rich languages. However, machine translation performance is still far from a production-level quality for the low resource languages. This work reports a semi-supervised neural machine translation system to boost the translation quality for an extremely resource constraint language pair, i.e. English–Manipuri. Our proposed approach exploits self-training and back-translation in a combined technique. The quantitative evaluation shows that the system performance improves by +0.9 BLEU score after introducing external noise to the input data. Additionally, a multi-reference test dataset developed in-house is used to evaluate the linguistic diversity of the highly agglutinative and morphologically rich Manipuri language. Experimental result attests that the proposed semi-supervised system outperforms the supervised, the pretrained mBART and existing semi-supervised baselines in terms of automatic score and subjective evaluation parameters by a significant margin up to +4.5 and +1.2 BLEU improvements against the supervised and mBART baselines respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.