We introduce efficient indexes for a problem in non-standard stringology: jumbled pattern matching. An index is a data structure constructed for a text of length n over an alphabet of size sigma that can answer queries asking if the text contains a fragment which is jumbled (Abelian) equivalent to a pattern, specified by its so-called Parikh vector. We denote the length of the pattern by m. Moosa and Rahman (J Discrete Algorithms 10:5–9, 2012) gave an index for the case of binary alphabets with mathcal {O}left( frac{n^2}{(log n)^2}right) -time construction in the word-RAM model. Several earlier papers stated as an open problem the existence of an efficient solution for larger alphabets. In this paper we develop an index for any constant-sized alphabet. The construction involves a trade-off parameter, which in particular lets us achieve the following complexities: mathcal {O}(n^{2-delta }) space and mathcal {O}(m^{(2sigma -1)delta }) query time for any 0<delta <1, or mathcal {O}left( frac{n^2 (log log n)^2}{log n}right) space and polylogarithmic, o(log ^{2sigma -1} m), query time. The construction time in both cases is subquadratic: mathcal {O}left( frac{n^2 (log log n)^2}{log n}right) in the word-RAM model (using bit-parallelism). Our construction algorithms are randomized (Las Vegas, running time w.h.p.), which is due to the usage of perfect hashing. On the other hand, all queries are answered deterministically. A preliminary version of this work appeared at ESA 2013 (Kociumaka et al. in Algorithms, ESA 2013. LNCS, vol 8125. Springer, Berlin, pp. 625–636, 2013). Here we improve it in several ways. We achieve mathcal {O}(n^2)-time construction of the index with mathcal {O}(n^{2-delta }) space and mathcal {O}(m^{(2sigma -1)delta }) query time, which was not present in the preliminary version. We also extend the index so that the position of the leftmost occurrence of the query pattern is provided at no additional cost in the complexity; this required rather nontrivial changes in the construction algorithm.
Read full abstract