Abstract

In this paper the statistical properties of CG clusters in coding and non-coding DNA sequences are investigated through calculating the cluster-size distribution of CG clusters P(S) and the breadth of the distribution of the root-mean-square size of CG clusters σ m in consecutive, non-overlapping blocks of m bases. There do exist some differences between coding and non-coding sequences. The cluster-size distribution of CG clusters P ( S ) for both coding and noncoding sequences follows an exponential decay of P ( S )∝ e − αS , and the value of α depends on the percentage of C–G content for coding sequences. It can fit into a linear line regularly but the case is contrary for noncoding sequences. We find that ξ ( m ) = σ m m of CG clusters all obeys the good power-law decay of ξ ( m )∝ m − γ in both coding and non-coding sequences, and the value of γ is 0.949 ± 0.014 and 0.826 ± 0.011 for coding and noncoding sequences, respectively. Therefore, we can distinguish between coding and non-coding sequences on the basis of the value of γ . At the meantime, we also discuss the power-law of ξ ( m )∝ m − γ for random sequence, and find that the value of γ for random sequence is very close to 1.00. So we can know that the value of γ for coding sequences is more close to the random sequence, and obtain the conclusion that the behavior of coding sequence trends to random sequence more similarly. This investigation can provide some insights into DNA sequences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call