Lower bounds for text indexing with mismatches and differences

Vincent Cohen-Addad ,Laurent Feuilloley ,Tatiana Starikovskaya

doi:10.5555/3310435.3310505

Vincent Cohen-Addad , Laurent Feuilloley + Show 1 more

PDF Available

https://doi.org/10.5555/3310435.3310505

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

In this paper we study lower bounds for the fundamental problem of text indexing with mismatches and differences. In this problem we are given a long string of length n, the "text", and the task is to preprocess it into a data structure such that given a query string Q, one can quickly identify substrings that are within Hamming or edit distance at most k from Q. This problem is at the core of various problems arising in biology and text processing.While exact text indexing allows linear-size data structures with linear query time, text indexing with k mismatches (or k differences) seems to be much harder: All known data structures have exponential dependency on k either in the space, or in the time bound. We provide conditional and pointer-machine lower bounds that make a step toward explaining this phenomenon.We start by demonstrating lower bounds for k = Θ(logn). We show that assuming the Strong Exponential Time Hypothesis, any data structure for text indexing that can be constructed in polynomial time cannot have O(n1-δ) query time, for any δ > 0. This bound also extends to the setting where we only ask for (1 + O)-approximate solutions for text indexing.However, in many applications the value of k is rather small, and one might hope that for small k we can develop more efficient solutions. We show that this would require a radically new approach as using the current methods one cannot avoid exponential dependency on k either in the space, or in the time bound for all even [MATH HERE]. Our lower bounds also apply to the dictionary look-up problem, where instead of a text one is given a set of strings.

Full Text