Abstract
In this paper, we discuss an efficient and effective index mechanism to do the string matching with k mismatches, by which we will find all the substrings in a target string s having at most k positions different from a pattern string r. The main idea is the Burrows–Wheeler transformation of s, denoted as BWT(s), used as an index to search r against it. During the process, the precomputed mismatch information of r will be utilized to speed up the BWT(s)'s navigation. In this way, the time complexity can be reduced to O(kn′+n+mlogm), where m=|r|, n=|s|, and n′ is the number of leaf nodes of a tree structure, called a mismatching tree, produced during a search of BWT(s). In the case of m≥ 2(k+1), the average value of n′ is bounded by O((1+1|Σ|)k+1), where Σ is an alphabet from which we take symbols to make up target and pattern strings. Extensive experiments have been conducted, which show that our method for this problem is promising.Categories and Subject Descriptors: F.2.2 [Analysis of Algorithms and Problem Complexity]: Non-numerical Algorithms and Problems Pattern matching; computation on discrete structuresGeneral Terms: Databases, Algorithms, Performance
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have