A word-to-phrase statistical translation model

Marcello Federico,Nicola Bertoldi

doi:10.1145/1115686.1115687

Abstract

This article addresses the development of statistical models for phrase-based machine translation (MT) which extend a popular word-alignment model proposed by IBM in the early 90s. A novel decoding algorithm is directly derived from the optimization criterion which defines the statistical MT approach. Efficiency in decoding is achieved by applying dynamic programming, pruning strategies, and word reordering constraints. It is known that translation performance can be boosted by exploiting phrase (or multiword) translation pairs automatically extracted from a parallel corpus. New phrase-based models are obtained by introducing extra multiwords in the target language vocabulary and by estimating the corresponding parameters from either: (i) a word-based model, (ii) phrase-based statistics computed on the parallel corpus, or (iii) the interpolation of the two previous estimates. Word-based and phrase-based MT models are evaluated on a traveling domain task in two translation directions: Chinese-English (12k-word vocabulary) and Italian-English (16k-word vocabulary). Phrase-based models show Bleu score improvements over the word-based model by 19% and 13% relative, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A word-to-phrase statistical translation model

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Speech and Language Processing

Lead the way for us

Journal: ACM Transactions on Speech and Language Processing	Publication Date: Dec 1, 2005
Citations: 17

Similar Papers

Multilingual Neural Translation

-

14 Feb 2020
14 Feb 2020

Research for Uyghur-Chinese Neural Machine Translation
Jinying Kong ... Xiao Li
-
Jinying Kong, et. al.Jinying Kong ... Xiao Li
01 Jan 2015
01 Jan 2015

Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
Nghia-Luan Pham ... Van-Vinh Nguyen
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36
Nghia-Luan Pham, et. al.Nghia-Luan Pham ... Van-Vinh Nguyen
30 May 2020
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36

A phrase-based unigram model for statistical machine translation
Christoph Tillmann ... Fei Xia
-
Christoph Tillmann, et. al.Christoph Tillmann ... Fei Xia
01 Jan 2003
01 Jan 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A word-to-phrase statistical translation model

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Speech and Language Processing