Abstract

We present an algorithm to search for a pattern containing a sequence of don't care symbols in a preprocessed text. This problem models proximity searching in text searching systems and special searching problems in biological sequences. The main result is that the suffix array data structure of Manber and Myers can be used to reduce this problem to the two-dimensional orthogonal range queries problem. Using known results for the latter problem we obtain a data structure that uses O( n( k + m)) space, where n is the size of the text, k is an upper bound on the number of don't care symbols and m is an upper bound on the number of characters of the pattern appearing before the don't care symbols. The number of occurrences of the pattern can be found in O(log n) time, and a list of all occurrences can be found in time O( n /4 + R), where R is the number of occurrences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call