Syntax-based Statistical Machine Translation Research Articles

The expressive power of regularity-preserving [Formula: see text]-free weighted linear multi bottom-up tree transducers is investigated. These models have very attractive theoretical and algorithmic properties, but (especially in the weighted setting) their expressive power is not well understood. Despite the regularity-preserving restriction, their power still exceeds that of composition chains of [Formula: see text]-free weighted linear extended top-down tree transducers with regular look-ahead. The latter devices are a natural super-class of weighted synchronous tree substitution grammars, which are commonly used in syntax-based statistical machine translation. In particular, the linguistically motivated discontinuous transformation of topicalization can be modeled by such multi bottom-up tree transducers, whereas the mentioned composition chains cannot implement it. On the negative side, the inverse of topicalization cannot be implemented by any such multi bottom-up tree transducer, which confirms their bottom-up nature (and non-closure under inverses). An interesting, promising, and widely applicable proof technique is used to prove these statements.

Read full abstract

Unknown words are one of the key factors that greatly affect the translation quality. Traditionally, nearly all the related researches focus on obtaining the translation of the unknown words. However, these approaches have two disadvantages. On the one hand, they usually rely on many additional resources such as bilingual web data; on the other hand, they cannot guarantee good reordering and lexical selection of surrounding words. This paper gives a new perspective on handling unknown words in statistical machine translation (SMT). Instead of making great efforts to find the translation of unknown words, we focus on determining the semantic function of the unknown word in the test sentence and keeping the semantic function unchanged in the translation process. In this way, unknown words can help the phrase reordering and lexical selection of their surrounding words even though they still remain untranslated. In order to determine the semantic function of an unknown word, we employ the distributional semantic model and the bidirectional language model. Extensive experiments on both phrase-based and linguistically syntax-based SMT models in Chinese-to-English translation show that our method can substantially improve the translation quality.

Read full abstract

Syntax-based Statistical Machine Translation Research Articles

Related Topics

Articles published on Syntax-based Statistical Machine Translation

Incorporating target language semantic roles into a string-to-tree translation model

Combining translation memories and statistical machine translation using sparse features

Composition Closure of Linear Extended Top-down Tree Transducers

The Power of Weighted Regularity-Preserving Multi Bottom-Up Tree Transducers

A Substitution-Translation-Restoration Framework for Handling Unknown Words in Statistical Machine Translation

Syntax-Based Post-Ordering for Efficient Japanese-to-English Translation

Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars

Re-structuring, Re-labeling, and Re-aligning for Syntax-Based Machine Translation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Syntax-based Statistical Machine Translation Research Articles

Related Topics

Articles published on Syntax-based Statistical Machine Translation

Incorporating target language semantic roles into a string-to-tree translation model

Combining translation memories and statistical machine translation using sparse features

Composition Closure of Linear Extended Top-down Tree Transducers

The Power of Weighted Regularity-Preserving Multi Bottom-Up Tree Transducers

A Substitution-Translation-Restoration Framework for Handling Unknown Words in Statistical Machine Translation

Syntax-Based Post-Ordering for Efficient Japanese-to-English Translation

Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars

Re-structuring, Re-labeling, and Re-aligning for Syntax-Based Machine Translation