Abstract

Algorithm development for finding quasiperiodic regions in sequences is at the core of many problems arising in biological sequence analysis. We solve an important problem in this area. Let A be an alphabet of size n and Al denote the set of sequences of length l over A. Given a sequence S = s1s2 . . . sl ? Al, a positive integer p is called a period of S if si = si+p for 1 ? i ? l - p. S is called p-periodic if it has a minimum period p. Let ?l(p) denote the set of p-periodic sequences in Al. A natural measure of nearness to p-periodicity for S is the average Hamming distance to the nearest p-periodic sequence: D(S) = minT??l(p)D(S,T). If T is a sequence ? ?l(p) such that D(S,T) = D(S), then T is called a nearest p-periodic sequence of S and S is called p- quasiperiodic associated with the score D(S). This paper develops an efficient algorithm for finding a nearest p-periodic sequence of S by means of its modulo-p incidence matrix. Let ? = (?1.., ?n) and s = (q+1.., q+1, qq..., q), ??? where l = ?1 + ?2 + . . . + ?n, is a partition of l and q is the quotient and r is the remainder when l is divided by p. This paper shows that there exists a sequence in Al whose modulo-p incidence matrix has row sum vector ? and column sum vector s.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.