Improved Approximate String Matching Using Compressed Suffix Data Structures

Tak-Wah Lam,Wing-Kin Sung,Swee-Seong Wong

doi:10.1007/11602613_35

Abstract

AbstractApproximate string matching is about finding a given string pattern in a text by allowing some degree of errors. In this paper we present a space efficient data structure to solve the 1-mismatch and 1-difference problems. Given a text T of length n over a fixed alphabet A, we can preprocess T and give an \(O(n\sqrt{{\rm log} n})\)-bit space data structure so that, for any query pattern P of length m, we can find all 1-mismatch (or 1-difference) occurrences of P in O(m log log n + occ) time, where occ is the number of occurrences. This is the fastest known query time given that the space of the data structure is o(n log2 n) bits.The space of our data structure can be further reduced to O(n) if we can afford a slow down factor of logε n, for 0 < ε ≤ 1. Furthermore, our solution can be generalized to solve the k-mismatch (and the k-difference) problem in O(|A|k m k(k+log log n) + occ) and O(logε n (|A|k m k(k+log log n) + occ)) query time using an \(O(n\sqrt{{\rm log} n})\)-bit and an O(n)-bit indexing data structures, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improved Approximate String Matching Using Compressed Suffix Data Structures

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Improved Approximate String Matching Using Compressed Suffix Data Structures
Tak-Wah Lam ... Swee-Seong Wong
Algorithmica | VOL. 51
Tak-Wah Lam, et. al.Tak-Wah Lam ... Swee-Seong Wong
01 Nov 2007
Algorithmica | VOL. 51

Ranked Document Retrieval in External Memory
Rahul Shah ... Cheng Sheng
ACM Transactions on Algorithms | VOL. 19
Rahul Shah, et. al.Rahul Shah ... Cheng Sheng
31 Jan 2023
ACM Transactions on Algorithms | VOL. 19

Approximate Range Mode and Range Median Queries
Prosenjit Bose ... Yihui Tang
-
Prosenjit Bose, et. al.Prosenjit Bose ... Yihui Tang
01 Jan 2004
01 Jan 2004

I/O-efficient data structures for non-overlapping indexing
Sahar Hooshmand ... Sharma V Thankachan
Theoretical Computer Science | VOL. 857
Sahar Hooshmand, et. al.Sahar Hooshmand ... Sharma V Thankachan
10 Dec 2020
Theoretical Computer Science | VOL. 857

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved Approximate String Matching Using Compressed Suffix Data Structures

Abstract

Talk to us

Similar Papers