Abstract

We study the classical approximate string matching problem, that is, given strings P and Q and an error threshold k, find all ending positions of substrings of Q whose edit distance to P is at most k. Let P and Q have lengths m and n, respectively. On a standard unit-cost word RAM with word size w≥log n we present an algorithm using time $$O\biggl(nk \cdot \min\biggl(\frac{\log^2 m}{\log n},\frac{\log^2 m\log w}{w}\biggr) + n\biggr)$$ When P is short, namely, $m = 2^{o(\sqrt{\log n}\,)}$ or $m =2^{o(\sqrt{w/\log w}\,)}$ this improves the previously best known time bounds for the problem. The result is achieved using a novel implementation of the Landau-Vishkin algorithm based on tabulation and word-level parallelism.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call