Abstract

Pattern matching with wildcards is a string matching problem with the goal of finding all factors of a text t of length n that match a pattern x of length m, where wildcards (characters that match everything) may be present. In this paper we present a number of complexity results and fast average-case algorithms for pattern matching where wildcards are allowed in the pattern, however, the results are easily adapted to the case where wildcards are allowed in the text as well. We analyse the average-case complexity of these algorithms and derive non-trivial time bounds. These are the first results on the average-case complexity of pattern matching with wildcards which provide a provable separation in time complexity between exact pattern matching and pattern matching with wildcards. We introduce the wc-period of a string which is the period of the binary mask xb where xb[i]=aiffx[i]≠ϕ and b otherwise. We denote the length of the wc-period of a string x by ▪. We show the following results for constant 0<ϵ<1 and a pattern x of length m and g wildcards with ▪ the prefix of length p contains gp wildcards:•If limm→∞⁡gpp=0 there is an optimal algorithm running in O(nlogσ⁡mm)-time on average.•If limm→∞⁡gpp=1−ϵ there is an algorithm running in O(nlogσ⁡mlog2⁡pm)-time on average.•If limm→∞⁡gm=limm→∞⁡1−f(m)=1 any algorithm takes at least Ω(nlogσ⁡mf(m))-time on average.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call