Abstract

BackgroundA better understanding of the size and abundance of open reading frames (ORFS) in whole genomes may shed light on the factors that control genome complexity. Here we examine the statistical distributions of open reading frames (i.e. distribution of start and stop codons) in the fully sequenced genomes of 297 prokaryotes, and 14 eukaryotes.Methodology/Principal FindingsBy fitting mixture models to data from whole genome sequences we show that the size-frequency distributions for ORFS are strikingly similar across prokaryotic and eukaryotic genomes. Moreover, we show that i) a large fraction (60–80%) of ORF size-frequency distributions can be predicted a priori with a stochastic assembly model based on GC content, and that (ii) size-frequency distributions of the remaining “non-random” ORFs are well-fitted by log-normal or gamma distributions, and similar to the size distributions of annotated proteins.Conclusions/SignificanceOur findings suggest stochastic processes have played a primary role in the evolution of genome complexity, and that common processes govern the conservation and loss of functional genomics units in both prokaryotes and eukaryotes.

Highlights

  • Understanding the origins of genome complexity remains a central challenge in evolutionary biology

  • Conclusions/Significance: Our findings suggest stochastic processes have played a primary role in the evolution of genome complexity, and that common processes govern the conservation and loss of functional genomics units in both prokaryotes and eukaryotes

  • Our results show that the vast majority of the heterogeneity in the size distributions of open reading frames (ORFs) can be predicted based on random assembly, and that much of the remaining, non-random variation shows a size distribution similar to that of proteins

Read more

Summary

Introduction

Understanding the origins of genome complexity remains a central challenge in evolutionary biology. Larger genomes generally have larger genes and more introns, most of the increase in genome size has been attributed to an increase in what appears to be non-coding DNA [4,6,7,10,11,12,13]. This observation has led some to hypothesize as to the possible adaptive significance of non-coding DNA [e.g., the skeletal-DNA hypothesis, 14] [i.e. the buffering-DNA hypothesis, 11], and others to suggest a primary role for neutral processes owing to the generally smaller effective population sizes of more derived organisms [2]. We examine the statistical distributions of open reading frames (i.e. distribution of start and stop codons) in the fully sequenced genomes of 297 prokaryotes, and 14 eukaryotes

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.