Phrase table re-adjustment for statistical machine translation

Debajyoty Banik

doi:10.1007/s10772-020-09676-0

Abstract

Neither assigning similar priority to all phrases nor pruning out the incorrect phrases from the phrase table can improve the accuracy of machine translation. In this paper, we present a novel method for weight re-adjustment of phrase table in a statistical machine translation system. It learns the correct and incorrect phrases from bilingual corpora. Based on the syntactic phrase-level information, phrase table is updated with the weights estimated using probability distribution. Evaluation on English–Hindi technical domain corpora shows that our proposed method is more effective in producing better output in terms of BLEU, RIBES and NIST metrics. We shows that the proposed methods works well for other language pairs like Hindi–Konkani and Bengali–Hindi. Finally, we realised that this minor probabilistic change can improve the accuracy of the machine translation system a lot.

Full Text