Abstract

This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k , we want to construct an index of T such that for any input pattern P , we can find all its k -error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O ( ( n log k n ) / B ) disk pages and finds all k -error matches with O ( ( | P | + o c c ) / B + log k n log log B n ) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω ( | P | + o c c + poly ( log n ) ) I/Os. The second index reduces the space to O ( ( n log n ) / B ) disk pages, and the I/O complexity is O ( ( | P | + o c c ) / B + log k ( k + 1 ) n log log n ) .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call