JU-Saarland Submission to the WMT2019 English–Gujarati Translation Shared Task

Riktim Mondal,Josef Van Genabith,Aditya Chowdhury,Sudip Kumar Naskar,Shankha Raj Nayek,Santanu Pal

doi:10.18653/v1/w19-5332

Abstract

In this paper we describe our joint submission (JU-Saarland) from Jadavpur University and Saarland University in the WMT 2019 news translation shared task for English–Gujarati language pair within the translation task sub-track. Our baseline and primary submissions are built using Recurrent neural network (RNN) based neural machine translation (NMT) system which follows attention mechanism. Given the fact that the two languages belong to different language families and there is not enough parallel data for this language pair, building a high quality NMT system for this language pair is a difficult task. We produced synthetic data through back-translation from available monolingual data. We report the translation quality of our English–Gujarati and Gujarati–English NMT systems trained at word, byte-pair and character encoding levels where RNN at word level is considered as the baseline and used for comparison purpose. Our English–Gujarati system ranked in the second position in the shared task.

Highlights

1 Introduction to increase the size of the parallel training dataset
We described our joint participation of Jadavpur University and Saarland University in the WMT 2019 news translation task for English–Gujarati and Gujarati–English
The released training data set is completely different in-domain compared to the development set and the size is not anywhere close to the sizable amount of training data which is typically required for the success of Neural Machine translation (NMT) systems

Summary

Related Works

Dungarwal et al (Dungarwal et al, 2014) developed a statistical method for machine translation, where phrase based method for Hindi-English and factored based method for English-Hindi SMT system was used. They had shown improvements to the existing SMT systems using pre-procesing and post-processing components that generated morphological inflections correctly. Ramesh et al (Ramesh and Sankaranarayanan, 2018) demonstrated how an existing model like bidirectional recurrent neural network can be used to generate parallel sentences for non-English languages like English-Tamil and English-Hindi, which belong to low-resource language pair, to improve the SMT and the NMT systems. Choudhary et al (Choudhary et al, 2018) has shown how to build NMT system for low resource parallel corpus language pair like English-Tamil using techniques like word embeddings and Byte-PairEncoding (Sennrich et al, 2016b) to handle OutOf-Vocabulary Words

Data Preparation

Data Preprocessing

Primary System description

Data Postprocessing

Experiment Setup

Other Supporting Experiments

Primary System Results

Conclusion and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

JU-Saarland Submission to the WMT2019 English–Gujarati Translation Shared Task

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2019
Citations: 15	License type: cc-by

Similar Papers

Baidu Translate: Research and Products
Zhongjun He
-
Zhongjun HeZhongjun He
01 Jan 2015
01 Jan 2015

English-Myanmar Supervised and Unsupervised NMT: NICT’s Machine Translation Systems at WAT-2019
Rui Wang ... Haipeng Sun
-
Rui Wang, et. al.Rui Wang ... Haipeng Sun
01 Jan 2019
English-Myanmar Supervised and Unsupervised NMT: NICT’s Machine Translation Systems at WAT-2019
Rui Wang ... Haipeng Sun

Synthesizing Parallel Data of User-Generated Texts with Zero-Shot Neural Machine Translation
Benjamin Marie ... Atsushi Fujita
Transactions of the Association for Computational Linguistics | VOL. 8
Benjamin Marie, et. al.Benjamin Marie ... Atsushi Fujita
01 Dec 2020
Transactions of the Association for Computational Linguistics | VOL. 8

English to Arabic Braille Neural Machine Translation Through Corpus Augmentation
Nisheeth Joshi ... Syed Afroz Ahmed
Procedia Computer Science | VOL. 244
Nisheeth Joshi, et. al.Nisheeth Joshi ... Syed Afroz Ahmed
01 Jan 2024
Procedia Computer Science | VOL. 244

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

JU-Saarland Submission to the WMT2019 English–Gujarati Translation Shared Task

Abstract

Highlights

Summary

Talk to us

Similar Papers