Abstract

Decoding of phrase-based translation models in the general case is known to be NP-complete, by a reduction from the traveling salesman problem (Knight, 1999). In practice, phrase-based systems often impose a hard distortion limit that limits the movement of phrases during translation. However, the impact on complexity after imposing such a constraint is not well studied. In this paper, we describe a dynamic programming algorithm for phrase-based decoding with a fixed distortion limit. The runtime of the algorithm is O( nd! lh d+1) where n is the sentence length, d is the distortion limit, l is a bound on the number of phrases starting at any position in the sentence, and h is related to the maximum number of target language translations for any source word. The algorithm makes use of a novel representation that gives a new perspective on decoding of phrase-based models.

Highlights

  • Phrase-based translation models (Koehn et al, 2003; Och and Ney, 2004) are widely used in statistical machine translation

  • This paper describes an algorithm for phrasebased decoding with a fixed distortion limit whose runtime is linear in the length of the sentence, and for a fixed distortion limit is polynomial in other factors

  • The algorithm builds on the insight that decoding with a hard distortion limit is related to the bandwidth-limited traveling salesman problem (BTSP) (Lawler et al, 1985)

Read more

Summary

Introduction

Phrase-based translation models (Koehn et al, 2003; Och and Ney, 2004) are widely used in statistical machine translation. The complexity of decoding with such a distortion limit is an open question: the NP-hardness result from Knight. For a hard distortion limit d, and sentence length n, the runtime is O(nd!lhd+1), where l is a bound on the number of phrases starting at any point in the sentence, and h is related to the maximum number of translations for any word in the source language sentence. The algorithm builds on the insight that decoding with a hard distortion limit is related to the bandwidth-limited traveling salesman problem (BTSP) (Lawler et al, 1985). The algorithm is amenable to beam search It is quite different from previous methods for decoding of phrase-based models, potentially opening up a very different way of thinking about decoding algorithms for phrasebased models, or more generally for models in statistical NLP that involve reordering

Related Work
Background
Bandwidth-Limited TSPPs
An Algorithm for Bandwidth-Limited TSPPs
Basic Definitions
Discussion
Beam Search
Complexity of Decoding with Bit-string Representations
Conclusion
A Proof of Lemma 4
B Proof of Lemma 5

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.