Abstract

We study the size distribution of coding and non-coding regions in DNA sequences. For most organisms we observe that the size distribution Pc(S) of the coding regions of size S shows short range distribution, whereas the size distribution of the non-coding regions follows a power-law decay Pnc(S) ∼ S−1 − μ, with power exponents indicating clear long-range behavior. We argue, using the Generalized Central Limit Theorem, that the long-range distributions observed in the non-coding are related to the lower level clustering of purines and pyrimidines (1d islands) which follow similar long-range laws. We also address the question of clustering of coding segments in the two complementary strands of DNA. We observe a short-range clustering of coding regions in both strands, expressed by an exponential decay in the clustering size distribution. The decay exponent expresses the degree of short-range correlations and the deviation from random clustering.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call