Abstract

Let T be a text of length n and P be a pattern of length m , both strings over a fixed finite alphabet A . The k -difference ( k -mismatch, respectively) problem is to find all occurrences of P in T that have edit distance (Hamming distance, respectively) at most k from P . In this paper we investigate a well-studied case in which T is fixed and preprocessed into an indexing data structure so that any pattern query can be answered faster. We give a solution using an O ( n log n ) bits indexing data structure with O ( | A | k m k · max ( k , log n ) + occ ) query time, where occ is the number of occurrences. The best previous result requires O ( n log n ) bits indexing data structure and gives O ( | A | k m k + 2 + occ ) query time. Our solution also allows us to exploit compressed suffix arrays to reduce the indexing space to O ( n ) bits, while increasing the query time by an O ( log n ) factor only.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.