Abstract
Machine Translation (MT) has come a long way in recent years, but it still suffers from data scarcity issue due to lack of parallel corpora for low (or sometimes zero) resource languages. However, Transfer Learning (TL) is one of the directions widely used for low-resource machine translation systems to overcome this issue. Creating parallel corpus for such languages is another way of dealing with data scarcity, yet costly, time-consuming and laborious task. In order to avoid the above listed limitations of parallel corpus formation, we present a TL-based Semi-supervised Pseudo-corpus Generation (TLSPG) approach for zero-shot MT systems. It generates the pseudo corpus by exploiting the relatedness between low resource language pairs and zero-resource language pairs via TL approach. It is further empirically ascertained in our experiments that such relatedness helps improve the performance of zero-shot MT systems. Experiments on zero-resource language pairs show that our approach effectively outperforms the existing state-of-the-art models, yielding improvement of +15.56,+8.13,+3.98 and +2 BLEU points for Bhojpuri→Hindi, Magahi→Hindi, Hindi→Bhojpuri and Hindi→Magahi, respectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of King Saud University - Computer and Information Sciences
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.