Abstract
In this paper the statistical properties of CG clusters in coding and non-coding DNA sequences are investigated through calculating the cluster-size distribution of CG clusters P(S) and the breadth of the distribution of the root-mean-square size of CG clusters σ m in consecutive, non-overlapping blocks of m bases. There do exist some differences between coding and non-coding sequences. The cluster-size distribution of CG clusters P ( S ) for both coding and noncoding sequences follows an exponential decay of P ( S )∝ e − αS , and the value of α depends on the percentage of C–G content for coding sequences. It can fit into a linear line regularly but the case is contrary for noncoding sequences. We find that ξ ( m ) = σ m m of CG clusters all obeys the good power-law decay of ξ ( m )∝ m − γ in both coding and non-coding sequences, and the value of γ is 0.949 ± 0.014 and 0.826 ± 0.011 for coding and noncoding sequences, respectively. Therefore, we can distinguish between coding and non-coding sequences on the basis of the value of γ . At the meantime, we also discuss the power-law of ξ ( m )∝ m − γ for random sequence, and find that the value of γ for random sequence is very close to 1.00. So we can know that the value of γ for coding sequences is more close to the random sequence, and obtain the conclusion that the behavior of coding sequence trends to random sequence more similarly. This investigation can provide some insights into DNA sequences.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.