Abstract

Tandem repeats play many important roles in biological research. However, accurate characterization of their properties is limited by the inability to easily detect them. For this reason, much work has been devoted to developing detection algorithms. A widely used algorithm for detecting tandem repeats is the "tandem repeats finder'' (Benson, G., Nucleic Acids Res. 27, 573-580, 1999). In that algorithm, tandem repeats are modeled by percent matches and frequency of indels between adjacent pattern copies, and statistical criteria are used to recognize them. We give a method for computing the exact joint distribution of a pair of statistics that are used in the testing procedures of the "tandem repeats finder'': the total number of matches in matching tuples of length k or longer, and the total number of observations from the beginning of the first such matching tuple to the end of the last one. This allows the computation of the conditional distribution of the latter statistic given the former, a conditional distribution that is used to test for tandem repeats as opposed to non-tandem direct repeats. The setting is a Markovian sequence of a general order. Current approaches to this distributional problem deal only with independent trials and are based on approximations via simulation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call