Complexity of alignment and decoding problems: restrictions and approximations

Noah Fleming,Antonina Kolokolova,Renesa Nizamee

doi:10.1007/s10590-015-9172-5

Abstract

We study the computational complexity of the Viterbi alignment and relaxed decoding problems for IBM model 3, focusing on the problem of finding a solution which has significant overlap with an optimal. That is, an approximate solution is considered good if it looks like some optimal solution with a few mistakes, where mistakes can be wrong values (such as a word aligned incorrectly or a wrong word in decoding), as well as insertions and deletions (spurious/missing words in decoding). In this setting, we show that it is computationally hard to find a solution which is correct on more than half (plus an inverse polynomial fraction) of the words. More precisely, if there is a polynomial-time algorithm computing an alignment for IBM model 3 which agrees with some Viterbi alignment on $$l/2+l^\epsilon $$l/2+l∈ words, where l is the length of the English sentence, or producing a decoding with $$l/2+l^\epsilon $$l/2+l∈ correct words, then P $$=$$= NP. We also present a similar structure inapproximability result for phrase-based alignment. As these strong lower bounds are for the general definitions of the Viterbi alignment and decoding problems, we also consider, from a parameterized complexity perspective, which properties of the input make these problems intractable. As a first step in this direction, we show that Viterbi alignment has a fixed-parameter tractable algorithm with respect to limiting the range of words in the target sentence to which a source word can be aligned. We note that by comparison, limiting maximal fertility--even to three--does not affect NP-hardness of the result.

Full Text