Abstract

BackgroundAmong microbial genomes, genetic information is frequently compressed, exploiting redundancies in the genetic code in order to store information in overlapping genes. We investigate the length, phase and orientation properties of overlap in 58 prokaryotic species evaluating neutral and selective mechanisms of evolution.ResultsUsing a variety of statistical null models we find patterns of compressive coding that can not be explained purely in terms of the selective processes favoring genome minimization or translational coupling. The distribution of overlap lengths follows a fat-tailed distribution, in which a significant proportion of overlaps are in excess of 100 base pairs in length. The phase of overlap – pairing of codon positions in complementary reading frames – is strongly predicted by the translation orientation of each gene. We find that as overlapping genes become longer, they have a tendency to alternate among alternative overlap phases. Some phases seem to reflect codon pairings reducing the probability of non-synonymous substitution. We analyze the lineage-dependent features of overlapping genes by tracing a number of different continuous characters through the prokaryotic phylogeny using squared-change parsimony and observe both clade-specific and species-specific patterns.ConclusionOverlapping reading frames preserve in their structure, features relating to mutational origination of new genes, but have undergone modification for both immediate benefits and for variational buffering and amplification. Genomes come under a variety of different mutational and selectional pressures, and the structure of redundancies in overlapping genes can be used to detect these pressures. No single mechanism is able to account for all the variability observed among the set of prokaryotic overlapping genes but a three-fold analysis of evolutionary events provides a more integrative framework.ReviewersThis article was reviewed by Eugene Koonin, Marten Huynem, and Han Liang.

Highlights

  • IntroductionIntroduction to Data Compression Academic PressSan Francisco. Ca; 2000.7

  • Introduction to Data Compression Academic PressSan Francisco

  • 3.2 Phase Statistics In order to test for variational benefits, above and beyond biases resulting from mutation, we investigate the statistical properties of different phases in which a given overlap can be found

Read more

Summary

Introduction

Introduction to Data Compression Academic PressSan Francisco. Ca; 2000.7. One of the remarkable discoveries arising from the study of microbial genomes is that a single sequence of nucleic acid bases can encode multiple different genes in overlapping reading frames [1,2,3,4,5]. This represents compression of genetic information, much like the compression of sound. We analyze length, polarity and phase properties of genetic overlap in prokaryotes adopting a common framework of genome compression [8]. To clarify the exposition we call these: (1) mutational origins, (2) immediate benefits, and (3) variational benefits

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call