Abstract

We study the position restricted substring searching (PRSS) problem, where the task is to index a text T[0...n-1] of n characters over an alphabet set @S of size @s, in order to answer the following: given a query pattern P (of length p) and two indices @? and r, report all occ@?,r occurrences of P in T[@?...r]. Known indexes take O(nlogn) bits or O(nlog^1^+^@en) bits space, and answer this query in O(p+logn+occ@?,rlogn) time or in optimal O(p+occ@?,r) time respectively, where @e is any positive constant. The main drawback of these indexes is their space requirement of @W(nlogn) bits, which can be much more than the optimal [email protected] bits to store the text T. This paper addresses an open question asked by Makinen and Navarro [LATIN, 2006], which is whether it is possible to design a succinct index answering PRSS queries efficiently. We first study the hardness of this problem and prove the following result: a succinct (or a compact) index cannot answer PRSS queries efficiently in the pointer machine model, and also not in the RAM model unless bounds on the well-researched orthogonal range query problem improve. However, for the special case of sufficiently long query patterns, that is for [email protected](log^2^+^@en), we derive an |CSAf|+|CSAr|+o(n) bits index with optimal query time, where |CSAf| and |CSAr| are the space (in bits) of the compressed suffix arrays (with O(p) time for pattern search) of T and T<- (the reverse of T) respectively. The space can be reduced further to |CSAf|+o(n) bits with a resulting query time will be O(p+occ@?,r+log^3^+^@en). For the general case, where there is no restriction on pattern length, we obtain an O([email protected]^[email protected]) bits index with O(p+occ@?,r+n^@e) query time. We use suffix sampling techniques to achieve these space-efficient indexes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call