Phrase Table Combination Based on Symmetrization of Word Alignment for Low-Resource Languages

Sari Dewi Budiwati,Tirana Noor Fatyanosa,Al Hafiz Akbar Maulana Siagian,Masayoshi Aritsugi

doi:10.3390/app11041868

Abstract

Phrase table combination in pivot approaches can be an effective method to deal with low-resource language pairs. The common practice to generate phrase tables in pivot approaches is to use standard symmetrization, i.e., grow-diag-final-and. Although some researchers found that the use of non-standard symmetrization could improve bilingual evaluation understudy (BLEU) scores, the use of non-standard symmetrization has not been commonly employed in pivot approaches. In this study, we propose a strategy that uses the non-standard symmetrization of word alignment in phrase table combination. The appropriate symmetrization is selected based on the highest BLEU scores in each direct translation of source–target, source–pivot, and pivot–target of Kazakh–English (Kk–En) and Japanese–Indonesian (Ja–Id). Our experiments show that our proposed strategy outperforms the direct translation in Kk–En with absolute improvements of 0.35 (a 11.3% relative improvement) and 0.22 (a 6.4% relative improvement) BLEU points for 3-gram and 5-gram, respectively. The proposed strategy shows an absolute gain of up to 0.11 (a 0.9% relative improvement) BLEU points compared to direct translation for 3-gram in Ja–Id. Our proposed strategy using a small phrase table obtains better BLEU scores than a strategy using a large phrase table. The size of the target monolingual and feature function weight of the language model (LM) could reduce perplexity scores.

Highlights

Prior studies have shown that non-standard symmetrization, i.e., intersection, could obtain higher bilingual evaluation understudy (BLEU) scores than the standard one [8,9,10], non-standard symmetrization has not been commonly used in pivot approaches
For Kk–En, we found that H-interpolation system approach (ISA) is a competitive approach because it provided absolute improvements of 0.35 and 0.22 BLEU points over baseline and Std-ISA in LM03 and LM05, respectively
We investigated the effect of the symmetrization of word alignment on the translation quality of Kk–En and Ja–Id language pairs in pivot approaches

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. The appropriate symmetrization of word alignment model needs to be investigated to improve the performance of low-resource languages when using phrase table combination in pivot approaches. And Habash [11] studied phrase table combinations based on the symmetrization of word alignment model in pivot approaches for the Hebrew–Arabic language pair. Unlike Kholy and Habash [11], in this study, we propose a strategy, i.e., phrase table combination, that uses symmetrization of word alignment, which obtains the highest BLEU scores. The rest of the paper is organized as follows: Section 2 reviews related work on the phrase table combination and current research on the low-resource languages used in this work, i.e., Kk–En and Ja–Id. Section 3 explains our proposed strategy.

Phrase Table Combination

Kk–En and Ja–Id as Low-Resource Language Pairs

An Interpolation System Approach

Direct and Pivot Translation

Datasets and Pre-Processing

Results and Discussion

Conclusions and Future Work

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Phrase Table Combination Based on Symmetrization of Word Alignment for Low-Resource Languages

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences

Lead the way for us

Journal: Applied sciences	Publication Date: Feb 20, 2021
License type: CC BY 4.0

Similar Papers

Does BLEU Score Work for Code Migration?
Ngoc Tran ... Son Nguyen
-
Ngoc Tran, et. al.Ngoc Tran ... Son Nguyen
01 May 2019
01 May 2019

Spelling Correction of Non-Word Errors in Uyghur–Chinese Machine Translation
Rui Dong ... Tonghai Jiang
Information | VOL. 10
Rui Dong, et. al.Rui Dong ... Tonghai Jiang
06 Jun 2019
Information | VOL. 10

Phrase-Based Named Entity Transliteration on Myanmar-English Terminology Dictionary
Aye Myat Mon ... Khin Mar Soe
-
Aye Myat Mon, et. al.Aye Myat Mon ... Khin Mar Soe
05 Nov 2020
05 Nov 2020

Classification of Utterance Acceptability Based on BLEU Scores for Dialogue-Based CALL Systems
Reiko Kuwa ... Tsuneo Kato
-
Reiko Kuwa, et. al.Reiko Kuwa ... Tsuneo Kato
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Phrase Table Combination Based on Symmetrization of Word Alignment for Low-Resource Languages

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences