Improved Approximate String Matching Using Compressed Suffix Data Structures

Tak-Wah Lam,Wing-Kin Sung,Swee-Seong Wong

doi:10.1007/s00453-007-9104-8

Abstract

Approximate string matching is about finding a given string pattern in a text by allowing some degree of errors. In this paper we present a space efficient data structure to solve the 1-mismatch and 1-difference problems. Given a text T of length n over an alphabet A, we can preprocess T and give an $O(n\sqrt{\log n}\log |A|)$-bit space data structure so that, for any query pattern P of length m, we can find all 1-mismatch (or 1-difference) occurrences of P in O(|A|mlog log n+occ) time, where occ is the number of occurrences. This is the fastest known query time given that the space of the data structure is o(nlog 2 n) bits. The space of our data structure can be further reduced to O(nlog |A|) with the query time increasing by a factor of log e n, for 0<e≤1. Furthermore, our solution can be generalized to solve the k-mismatch (and the k-difference) problem in O(|A| k m k (k+log log n)+occ) and O(log e n(|A| k m k (k+log log n)+occ)) time using an $O(n\sqrt{\log n}\log |A|)$-bit and an O(nlog |A|)-bit indexing data structures, respectively. We assume that the alphabet size |A| is bounded by $O(2^{\sqrt{\log n}})$ for the $O(n\sqrt{\log n}\log |A|)$-bit space data structure.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improved Approximate String Matching Using Compressed Suffix Data Structures

Abstract

Talk to us

Similar Papers

More From: Algorithmica

Lead the way for us

Journal: Algorithmica	Publication Date: Nov 1, 2007
Citations: 42

Similar Papers

Improved Approximate String Matching Using Compressed Suffix Data Structures
Tak-Wah Lam ... Swee-Seong Wong
-
Tak-Wah Lam, et. al.Tak-Wah Lam ... Swee-Seong Wong
01 Jan 2004
01 Jan 2004

Ranked Document Retrieval in External Memory
Rahul Shah ... Cheng Sheng
ACM Transactions on Algorithms | VOL. 19
Rahul Shah, et. al.Rahul Shah ... Cheng Sheng
31 Jan 2023
ACM Transactions on Algorithms | VOL. 19

Approximate Chinese String Matching Techniques Based on Pinyin Input Method
Bing Liu ... Dan Han
Applied Mechanics and Materials | VOL. 513-517
Bing Liu, et. al.Bing Liu ... Dan Han
06 Feb 2014
Applied Mechanics and Materials | VOL. 513-517

I/O-efficient data structures for non-overlapping indexing
Sahar Hooshmand ... Sharma V Thankachan
Theoretical Computer Science | VOL. 857
Sahar Hooshmand, et. al.Sahar Hooshmand ... Sharma V Thankachan
10 Dec 2020
Theoretical Computer Science | VOL. 857

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved Approximate String Matching Using Compressed Suffix Data Structures

Abstract

Talk to us

Similar Papers

More From: Algorithmica