Approximate pattern matching with k-mismatches in packed text

Emanuele Giaquinta,Szymon Grabowski,Kimmo Fredriksson

doi:10.1016/j.ipl.2013.07.002

Abstract

Given strings P of length m and T of length n over an alphabet of size σ, the string matching with k-mismatches problem is to find the positions of all the substrings in T that are at Hamming distance at most k from P. If T can be read only one character at the time the best known bounds are O(nklogk) and O(n+nk/wlogk) in the word-RAM model with word length w. In the RAM models (including AC0 and word-RAM) it is possible to read up to ⌊w/logσ⌋ characters in constant time if the characters of T are encoded using ⌈logσ⌉ bits. The only solution for k-mismatches in packed text works in O((nlogσ/logn)⌈mlog(k+logn/logσ)/w⌉+nε) time, for any ε>0. We present an algorithm that runs in time O(n⌊w/(mlogσ)⌋(1+logmin(k,σ)logm/logσ)) in the AC0 model if m=O(w/logσ) and T is given packed. We also describe a simpler variant that runs in time O(n⌊w/(mlogσ)⌋logmin(m,logw/logσ)) in the word-RAM model. The algorithms improve the existing bound for w=Ω(log1+ϵn), for any ϵ>0. Based on the introduced technique, we present algorithms for several other approximate matching problems.

Full Text