Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages

P. Nakov,H. T. Ng

doi:10.1613/jair.3540

Abstract

We propose a novel language-independent approach for improving machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resource-poor source language X_1 into a resource-rich language Y given a bi-text containing a limited number of parallel sentences for X_1-Y and a larger bi-text for X_2-Y for some resource-rich language X_2 that is closely related to X_1. This is achieved by taking advantage of the opportunities that vocabulary overlap and similarities between the languages X_1 and X_2 in spelling, word order, and syntax offer: (1) we improve the word alignments for the resource-poor language, (2) we further augment it with additional translation options, and (3) we take care of potential spelling differences through appropriate transliteration. The evaluation for Indonesian- >English using Malay and for Spanish -> English using Portuguese and pretending Spanish is resource-poor shows an absolute gain of up to 1.35 and 3.37 BLEU points, respectively, which is an improvement over the best rivaling approaches, while using much less additional data. Overall, our method cuts the amount of necessary "real'' training data by a factor of 2--5.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Artificial Intelligence Research	Publication Date: May 30, 2012
Citations: 48	License type: cc-by

R Discovery Prime

R Discovery Prime

Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages

Abstract

Talk to us

Similar Papers

More From: Journal of Artificial Intelligence Research

Lead the way for us

Similar Papers

Improved statistical machine translation for resource-poor languages using related resource-rich languages
Preslav Nakov ... Hwee Tou Ng
-
Preslav Nakov, et. al.Preslav Nakov ... Hwee Tou Ng
01 Jan 2009
01 Jan 2009

Source Language Adaptation Approaches for Resource-Poor Machine Translation
Pidong Wang ... Hwee Tou Ng
Computational Linguistics | VOL. 42
Pidong Wang, et. al.Pidong Wang ... Hwee Tou Ng
01 Jun 2016
Computational Linguistics | VOL. 42

Contrastive Learning of Emoji-Based Representations for Resource-Poor Languages
Nurendra Choudhary ... Manish Shrivastava
-
Nurendra Choudhary, et. al.Nurendra Choudhary ... Manish Shrivastava
01 Jan 2023
01 Jan 2023

Emotions Are Universal: Learning Sentiment Based Representations of Resource-Poor Languages Using Siamese Networks
Nurendra Choudhary ... Manish Shrivastava
-
Nurendra Choudhary, et. al.Nurendra Choudhary ... Manish Shrivastava
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages

Abstract

Talk to us

Similar Papers

More From: Journal of Artificial Intelligence Research