Abstract

The size distributions of all known coding and noncoding DNA sequences are studied in all human chromosomes. In a unified approach, both introns and intergenic regions are treated as noncoding regions. The distributions of noncoding segments Pnc(S) of size S present long tails Pnc(S) approximately S(-1-mu nc) , with exponents mu nc ranging between 0.71 (for chromosome 13) and 1.2 (for chromosome 19). On the contrary, the exponential, short-range decay terms dominate in the distributions of coding (exon) segments Pc(S) in all chromosomes. Aiming to address the emergence of these statistical features, minimal, stochastic, mean-field models are proposed, based on randomly aggregating DNA strings with duplication, influx and outflux of genomic segments. These minimal models produce both the short-range statistics in the coding and the observed power law and fractal statistics in the noncoding DNA. The minimal models also demonstrate that although the two systems (coding and noncoding) coexist, alternating on the same linear chain, they act independently: the coding as a closed, equilibrium system and the noncoding as an open, out-of-equilibrium one.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.