Abstract

The study and comparison of strings of symbols from a finite or an infinite alphabet is relevant to various areas of science, notably molecular biology, speech recognition, and computer science. In particular, the problem of finding the minimum “distance” between two strings (in general, two blocks of data) is of a practical importance. In this article we investigate the (string) pattern matching problem in a probabilistic framework, namely, it is assumed that both strings form an independent sequences of i.i.d. symbols. Given a text string a of length n and a pattern string b of length m, let Mm,n be the maximum number of matches between b and all m-substrings of a. Our main probabilistic result shows that for a wide range of input parameters in probability (pr.) provided m, n ∞ such that log n = o(m), where P is the probability of a match between any two symbols of these strings, and T is the probability of a match between two positions in the text string and a given position of the pattern string. We also prove that Mm,n/mP almost surely (a.s.) for log n = o(m). © 1993 John Wiley & Sons. Inc.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.