Let P be a collection of d patterns {P1,P2,…,Pd} of total length n characters, which are chosen from an alphabet Σ of size σ. Given a text T (over Σ), the dictionary indexing problem is to create a data structure using which we can report all positions j (called occurrences) where at least one of the patterns Pi∈P is a match with the same-length substring of T that starts at j. We consider this problem under the following definitions of matching.•Parameterized Matching: The characters of Σ are partitioned into static characters and parameterized characters. Two equal length strings S and S′ are a parameterized match iff the static characters match exactly, and there exists a one-to-one function which renames the parameterized characters in S to those in S′.•Order-Preserving Matching: The alphabet Σ is ordered. Two equal length strings S and S′ are an order-preserving match iff for any two integers i,j∈[1,|S|], S[i]≺S[j]⇔S′[i]≺S′[j], where ≺ denotes the precedence order in Σ. Let ε>0 be an arbitrarily small constant. For parameterized matching, we first present a compact O(nlogσ+dlogn)-bit index that reports all occ occurrences in O(|T|(logσ+logσn)+occ) time, and then a succinct nlogσ+o(nlogσ)+O(dlogn)-bit index that reports all occ occurrences in O(|T|(logσ+logεnlogσn)+occ) time. For order-preserving matching, we present indexes of the same sizes, but with slightly increased query time.
Read full abstract