Phrase Pairs Research Articles

Long-range word order differences are a well-known problem for machine translation. Unlike the standard phrase-based models which work with sequential and local phrase reordering, the hierarchical phrase-based model (Hiero) embeds the reordering of phrases within pairs of lexicalized context-free rules. This allows the model to handle long range reordering recursively. However, the Hiero grammar works with a single nonterminal label, which means that the rules are combined together into derivations independently and without reference to context outside the rules themselves. Follow-up work explored remedies involving nonterminal labels obtained from monolingual parsers and taggers. As of yet, no labeling mechanisms exist for the many languages for which there are no good quality parsers or taggers. In this paper we contribute a novel approach for acquiring reordering labels for Hiero grammars directly from the word-aligned parallel training corpus, without use of any taggers or parsers. The new labels represent types of alignment patterns in which a phrase pair is embedded within larger phrase pairs. In order to obtain alignment patterns that generalize well, we propose to decompose word alignments into trees over phrase pairs. Beside this labeling approach, we contribute coarse and sparse features for learning soft, weighted label-substitution as opposed to standard substitution. We report extensive experiments comparing our model to two baselines: Hiero and the known syntax augmented machine translation (SAMT) variant, which labels Hiero rules with nonterminals extracted from monolingual syntactic parses. We also test a simplified labeling scheme based on inversion transduction grammar (ITG). For the Chinese---English task we obtain performance improvement up to 1 BLEU point, whereas for the German---English task, where morphology is an issue, a minor (but statistically significant) improvement of 0.2 BLEU points is reported over SAMT. While ITG labeling does give a performance improvement, it remains sometimes suboptimal relative to our proposed labeling scheme.

Read full abstract

Until quite recently, extending phrase-based statistical machine translation (PBSMT) with syntactic knowledge caused system performance to deteriorate. The most recent successful enrichments of PBSMT with hierarchical structure either employ nonlinguistically motivated syntax for capturing hierarchical reordering phenomena, or extend the phrase translation table with redundantly ambiguous syntactic structures over phrase pairs. In this paper, we present an extended, harmonized account of our previous work which showed that incorporating linguistically motivated lexical syntactic descriptions, called <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">supertags</i> , can yield significantly better PBSMT systems at insignificant extra computational cost. We describe a novel PBSMT model that integrates supertags into the target language model and the target side of the translation model. Two kinds of supertags are employed: those from lexicalized tree-adjoining grammar and combinatory categorial grammar. Despite the differences between the two sets of supertags, they give similar improvements. In addition to integrating the Markov supertagging approach in PBSMT, we explore the utility of a new surface grammaticality measure based on combinatory operators. We perform various experiments on the Arabic-to-English NIST 2005 test set addressing the issues of sparseness, scalability, and the utility of system subcomponents. We show that even when the parallel training data grows very large, the supertagged system retains a relatively stable absolute performance advantage over the unadorned PBSMT system. Arguably, this hints at a performance gap that cannot be bridged by acquiring more phrase pairs. Our best result shows a relative improvement of 6.1% over a state-of-the-art PBSMT model, which compares favorably with the leading systems on the NIST 2005 task. We also demonstrate that the advantages of a supertag-based system carry over to German-English, where improvements of up to 8.9% relative to the baseline system are observed.

Read full abstract

Phrase Pairs Research Articles

Related Topics

Articles published on Phrase Pairs

Phrase-boundary model for statistical machine translation

Labeling hierarchical phrase-based models without linguistic resources

From Paraphrase Database to Compositional Paraphrase Model and Back

Extraction of Potentially Useful Phrase Pairs for Statistical Machine Translation

Learning relational facts from the web: A tolerance rough set approach

Online adaptation to post-edits for phrase-based statistical machine translation

Topic-aware pivot language approach for statisticalmachine translation

Improvement of the Results of Statistical Machine Translation System using Anusaaraka

A Phrase Table Filtering Model Based on Binary Classification for Uyghur-Chinese Machine Translation

A relationship: word alignment, phrase table, and translation quality.

Maximum-entropy word alignment and posterior-based phrase extraction for machine translation

Clarification Question Generation for Speech Recognition Error Recovery Using Monolingual SMT

An Investigation of the Sampling-Based Alignment Method and its Contributions

Decorated Phrase Model and Syntax-Based Reordering Model for Statistical Machine Translation

Eppex: Epochal Phrase Table Extraction for Statistical Machine Translation

Improving Phrasebased Statistical Myanmar to English Machine Translation with Morphological Analysis

Roles of syntax information in directing song development in white-crowned sparrows (zonotrichia leucophrys).

Automatically generated parallel treebanks and their exploitability in machine translation

Tutor model syntax influences the syntactical and phonological structure of crystallized songs of white-crowned sparrows

Syntactically Lexicalized Phrase-Based SMT

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Phrase Pairs Research Articles

Related Topics

Articles published on Phrase Pairs

Phrase-boundary model for statistical machine translation

Labeling hierarchical phrase-based models without linguistic resources

From Paraphrase Database to Compositional Paraphrase Model and Back

Extraction of Potentially Useful Phrase Pairs for Statistical Machine Translation

Learning relational facts from the web: A tolerance rough set approach

Online adaptation to post-edits for phrase-based statistical machine translation

Topic-aware pivot language approach for statisticalmachine translation

Improvement of the Results of Statistical Machine Translation System using Anusaaraka

A Phrase Table Filtering Model Based on Binary Classification for Uyghur-Chinese Machine Translation

A relationship: word alignment, phrase table, and translation quality.

Maximum-entropy word alignment and posterior-based phrase extraction for machine translation

Clarification Question Generation for Speech Recognition Error Recovery Using Monolingual SMT

An Investigation of the Sampling-Based Alignment Method and its Contributions

Decorated Phrase Model and Syntax-Based Reordering Model for Statistical Machine Translation

Eppex: Epochal Phrase Table Extraction for Statistical Machine Translation

Improving Phrasebased Statistical Myanmar to English Machine Translation with Morphological Analysis

Roles of syntax information in directing song development in white-crowned sparrows (zonotrichia leucophrys).

Automatically generated parallel treebanks and their exploitability in machine translation

Tutor model syntax influences the syntactical and phonological structure of crystallized songs of white-crowned sparrows

Syntactically Lexicalized Phrase-Based SMT