Abstract
In the traditional version of the problem of approximate pattern matching, a pattern symbol is considered to match a text symbol if the two symbols are equal. Such a notion of exact equality is not suitable for situations where the text and pattern symbols are imprecise, e.g., obtained from an analog source, distorted by additive noise, etc. In such situations it is more appropriate to consider two alphabet symbols to match even if they are not equal, as long as they do not differ by more than a given threshold θ. The goal is then to compute the number of matches of the length-M pattern with all length-M substrings of the length-N text, i.e., to compute a vector of N−M+1 scores, where the ith score is the number of matches between the pattern and the substring that begins at text position i. The main result of this paper is to show that this threshold version of the problem can be solved by recursively solving 3+2logθ instances of the traditional (i.e., zero-threshold) version of the problem, which is much-studied in the literature and for which there are many efficient (typically randomized) solutions of time complexity close to O(NlogM). This paper's result therefore implies the first randomized O(NlogM(logθ+1)) solution for the threshold version of the problem. It also implies that any future improvement to the traditional (zero-threshold) version of the problem automatically translates into a similar improvement to the arbitrary-threshold case. Furthermore, we show that the factor Ω(logθ) is tight if use our recursive framework.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.