Abstract

We consider a class of random variables that includes scoring functions arising in computational molecular biology, such as sequence alignment and folding. We characterize the class by a set of properties, and show that, under certain conditions, such random variables follow an Erdos-Renyi law of large numbers. That is, $$\begin{gathered} _{\text{2}}^{\text{ + }} \hfill \\ \hfill \\ \mathop {\lim }\limits_{n \to \infty } \frac{{T_n }}{{\log n}} = sd{\text{ a}}{\text{.s}}{\text{.}} \hfill \\ \end{gathered} $$ where Tn is the maximum score over contiguous regions from each of s independent sequences, and d is a function of the large deviation rate of the scoring function. This result unifies several others, and applies to more general scoring systems on any number of sequences. We show how the theorem can be applied to a recently introduced scoring function. Finally, we conjecture that a modified form of this function behaves similarly, and support the conjecture with simulations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call