Abstract

Phrase table combination in pivot approaches can be an effective method to deal with low-resource language pairs. The common practice to generate phrase tables in pivot approaches is to use standard symmetrization, i.e., grow-diag-final-and. Although some researchers found that the use of non-standard symmetrization could improve bilingual evaluation understudy (BLEU) scores, the use of non-standard symmetrization has not been commonly employed in pivot approaches. In this study, we propose a strategy that uses the non-standard symmetrization of word alignment in phrase table combination. The appropriate symmetrization is selected based on the highest BLEU scores in each direct translation of source–target, source–pivot, and pivot–target of Kazakh–English (Kk–En) and Japanese–Indonesian (Ja–Id). Our experiments show that our proposed strategy outperforms the direct translation in Kk–En with absolute improvements of 0.35 (a 11.3% relative improvement) and 0.22 (a 6.4% relative improvement) BLEU points for 3-gram and 5-gram, respectively. The proposed strategy shows an absolute gain of up to 0.11 (a 0.9% relative improvement) BLEU points compared to direct translation for 3-gram in Ja–Id. Our proposed strategy using a small phrase table obtains better BLEU scores than a strategy using a large phrase table. The size of the target monolingual and feature function weight of the language model (LM) could reduce perplexity scores.

Highlights

  • Prior studies have shown that non-standard symmetrization, i.e., intersection, could obtain higher bilingual evaluation understudy (BLEU) scores than the standard one [8,9,10], non-standard symmetrization has not been commonly used in pivot approaches

  • For Kk–En, we found that H-interpolation system approach (ISA) is a competitive approach because it provided absolute improvements of 0.35 and 0.22 BLEU points over baseline and Std-ISA in LM03 and LM05, respectively

  • We investigated the effect of the symmetrization of word alignment on the translation quality of Kk–En and Ja–Id language pairs in pivot approaches

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. The appropriate symmetrization of word alignment model needs to be investigated to improve the performance of low-resource languages when using phrase table combination in pivot approaches. And Habash [11] studied phrase table combinations based on the symmetrization of word alignment model in pivot approaches for the Hebrew–Arabic language pair. Unlike Kholy and Habash [11], in this study, we propose a strategy, i.e., phrase table combination, that uses symmetrization of word alignment, which obtains the highest BLEU scores. The rest of the paper is organized as follows: Section 2 reviews related work on the phrase table combination and current research on the low-resource languages used in this work, i.e., Kk–En and Ja–Id. Section 3 explains our proposed strategy.

Phrase Table Combination
Kk–En and Ja–Id as Low-Resource Language Pairs
An Interpolation System Approach
Direct and Pivot Translation
Datasets and Pre-Processing
Results and Discussion
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call