Abstract

We calculate the probability distribution for the number of occurrences n of a given l letter word x inside a random string of k letters, whose letters have been generated by a known stationary stochastic process. Denoting by p(x) the probability of occurrence of the word, it is well-known that the distribution of occurrences in the asymptotic regime k → ∞ such that kp(x) ≫ 1 is Gaussian, while in the limit k→ ∞, and p(x) → 0 , such that kp(x) is finite, the distribution is Compound Poisson. It is also known that these limiting forms do not work well in the intermediate regime when kp(x) >rsim 1 and k is finite. We show that the problem of calculating the probability of occurrences is equivalent to determining the configurational partition function of a 1d lattice gas of interacting particles, with the probability distribution given by the n-particle terms of the grand-partition function and the number of particles corresponding to the number of occurrences on the string. Utilizing this equivalence, we obtain the probability distribution from the equation of state of the lattice gas. Our result reproduces rather well the behavior of the distribution in the asymptotic as well as the intermediate regimes. Within the lattice gas description, the asymptotic forms of the distribution naturally emerge as certain low density approximations. Thus our approach which is based on statistical mechanics, also provides an alternative to the usual statistics based treatments employing the central limit and Chen–Stein theorems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.