Abstract

Motivated by the comparison of DNA sequences, a generalization is given of the result of Erdös and Rényi on the length R n of the longest run of heads in the first n tosses of a coin. Consider two sequences, X 1 X 2 … X n and Y 1 Y 2 … Y n . The length of the longest matching consecutive subsequence, allowing shifts, is M n ≡ max{ m: X i + k = Y j + k for k = 1 to m, for some 0 ⩽ i, j ⩽ n − m}. Suppose that all the “letters” are independent and identically distributed. The length of the longest match without shifts has the same distribution as R n , the length of the longest head run for a biased coin with p = P( X i = Y i ), described by the Erdös-Rényi law: P( lim n → ∞ R n log 1 p (n) = 1) = 1 . For matching with shifts, our result is: P( lim n → ∞ M n log 1 p (n) = 2) = 1 . Loosely speaking, allowing shifts doubles the length of the longest match. The case of Markov chains is also handled.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.