Phrase-based Statistical Machine Translation System Research Articles

Phrase-Based Statistical Machine Translation (PB-SMT) is clearly the leading paradigm in the field today. Nevertheless—and this may come as some surprise to the PB-SMT community—most translators and, somewhat more surprisingly perhaps, many experienced MT protagonists find the basic model extremely difficult to understand. The main aim of this paper, therefore, is to discuss why this might be the case. Our basic thesis is that proponents of PB-SMT do not seek to address any community other than their own, for they do not feel any need to do so. We demonstrate that this was not always the case; on the contrary, when statistical models of trans-lation were first presented, the language used to describe how such a model might work was very conciliatory, and inclusive. Over the next five years, things changed considerably; once SMT achieved dominance particularly over the rule-based paradigm, it had established a position where it did not need to bring along the rest of the MT community with it, and in our view, this has largely pertained to this day. Having discussed these issues, we discuss three additional issues: the role of automatic MT evaluation metrics when describing PB-SMT systems; the recent syntactic embellishments of PB-SMT, noting especially that most of these contributions have come from researchers who have prior experience in fields other than statistical models of translation; and the relationship between PB-SMT and other models of translation, suggesting that there are many gains to be had if the SMT community were to open up more to the other MT paradigms.

Read full abstract

An efficient and publicly open machine translation system is in dire need to get the maximum benefits of Information and Communication Technology through removing the language barrier in this era of globalization. In this study, we present a Phrase-Based Statistical Machine Translation (PBMT) system between English and Bangla languages in both directions. To the best of our knowledge, the system is trained on the largest dataset of more than three million tokens each side in English↔Bangla translation task. In the system, we perform data preprocessing and use optimized parameters to produce efficient system output. We analyze our system output from several viewpoints: overall results, comparisons with the available systems, sentence type and length effect, and behaviour of two challenging linguistic properties– prepositional phrase and noun inflection. Our analysis provides useful insights that translating into morphologically richer language is harder than translating from them and this is mainly due to the difficulties of translating noun inflections. Comparisons with the available systems show that our system outperforms the other systems significantly and gain 10.84 BLEU, 2.18 NIST and 19.02 TER points over the next best system. The analysis of the sentence type and length effect shows that simple sentences are easier to translate and the sentences longer than 15 words are harder to translate for English↔Bangla translation task. To foster the English↔Bangla machine translation research, we have developed development and test datasets, which are representative in sentence length and balanced in genre to be used as a benchmark and are made publicly available.

Read full abstract

Phrase-based Statistical Machine Translation System Research Articles

Related Topics

Articles published on Phrase-based Statistical Machine Translation System

A Critique of Statistical Machine Translation

Evaluation of English–Slovak Neural and Statistical Machine Translation

Semantically Smooth Bilingual Phrase Embeddings Based on Recursive Autoencoders

Deep learning-based techniques to enhance the precision of phrase-based statistical machine translation system for Indian languages

Deep learning-based techniques to enhance the precision of phrase-based statistical machine translation system for Indian languages

Neural Machine Translation for Low-resource English-Bangla

Google Translate Gets Voltaire: Literary Translation and the Age of Artificial Intelligence

Shu-torjoma: An English↔Bangla Statistical Machine Translation System

Developing Statistical Machine Translation System for English and Nigerian Languages

Statistical machine translation of Indian languages: a survey

Dependency-based Pre-ordering For English-Vietnamese Statistical Machine Translation

Toward Building a Comprehensive Phrase-based English-Arabic Statistical Machine Translation System

Detecting and Integrating Multiword Expression into English-Arabic Statistical Machine Translation

Template-Based Model for Mongolian-Chinese Machine Translation

Efficient Word Alignment with Markov Chain Monte Carlo

End-to-end statistical machine translation with zero or small parallel texts

A deep source-context feature for lexical selection in statistical machine translation

Integrating Rules and Dictionaries from Shallow-Transfer Machine Translation into Phrase-Based Statistical Machine Translation

Exploring Diverse Features for Statistical Machine Translation Model Pruning

Statistical Post Editing System (SPES) Applied to Hindi-Punjabi PB-SMT System

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Phrase-based Statistical Machine Translation System Research Articles

Related Topics

Articles published on Phrase-based Statistical Machine Translation System

A Critique of Statistical Machine Translation

Evaluation of English–Slovak Neural and Statistical Machine Translation

Semantically Smooth Bilingual Phrase Embeddings Based on Recursive Autoencoders

Deep learning-based techniques to enhance the precision of phrase-based statistical machine translation system for Indian languages

Deep learning-based techniques to enhance the precision of phrase-based statistical machine translation system for Indian languages

Neural Machine Translation for Low-resource English-Bangla

Google Translate Gets Voltaire: Literary Translation and the Age of Artificial Intelligence

Shu-torjoma: An English↔Bangla Statistical Machine Translation System

Developing Statistical Machine Translation System for English and Nigerian Languages

Statistical machine translation of Indian languages: a survey

Dependency-based Pre-ordering For English-Vietnamese Statistical Machine Translation

Toward Building a Comprehensive Phrase-based English-Arabic Statistical Machine Translation System

Detecting and Integrating Multiword Expression into English-Arabic Statistical Machine Translation

Template-Based Model for Mongolian-Chinese Machine Translation

Efficient Word Alignment with Markov Chain Monte Carlo

End-to-end statistical machine translation with zero or small parallel texts

A deep source-context feature for lexical selection in statistical machine translation

Integrating Rules and Dictionaries from Shallow-Transfer Machine Translation into Phrase-Based Statistical Machine Translation

Exploring Diverse Features for Statistical Machine Translation Model Pruning

Statistical Post Editing System (SPES) Applied to Hindi-Punjabi PB-SMT System