Abstract

BackgroundGenomic GC content varies both within and, substantially, between microbial genomes. While some of this variation can be explained by evolutionary divergence and environmental factors, a notable portion is not understood. To investigate further, we explore a non-linear mathematical model (gcMOD) of single-nucleotide polymorphism (SNP) GC content (sbGC, the GC content of substituted bases) as a function of core genome GC content (cgGC). We estimate the model’s parameters using Bayesian inference on empirical genetic data from the microbial core genomes of 35 bacterial species, each of which contains at least 10 representative strains. We utilize 716 bacterial genomes in total. We also explore some possible implications that result from the mathematical properties of gcMOD.ResultsWe find that the median GC → AT substitution rates (β) are almost always considerably higher than the corresponding AT → GC substitution rates (α) for all 35 core genomes. The distribution of β is also noticeably more concentrated (i.e. thinner) than the corresponding distribution of α for almost all species, excepting the bacteria with the most GC-rich genomes. We also demonstrate that at the singularity point of gcMOD (where α = β), the model is reduced to a linear equation. By analyzing the linear model, we show that due to the constraints on gcMOD, the mutation rates can have profound influence on both cgGC as well as sbGC. Moreover, by examining the mathematical properties of gcMOD’s inverse function, we find that change in cgGC, and hence in genomic GC content, can potentially occur quite rapidly.ConclusionsExamining the distributions of the GC → AT and AT → GC substitution rates for 35 bacterial species, we demonstrate that the former (β) are remarkably similar for all species examined. In addition, GC → AT substitution rate distributions were considerably more concentrated for all species, with the mode consistently peaking at higher rates than for AT → GC substitution rates.

Highlights

  • Genomic GC content varies both within and, substantially, between microbial genomes

  • Examining the distributions of the GC → AT and AT → GC substitution rates for 35 bacterial species, we demonstrate that the former (β) are remarkably similar for all species examined

  • Core genome singlenucleotide polymorphism (SNP) GC content is assumed to be a function of total core genome GC content

Read more

Summary

Introduction

Genomic GC content varies both within and, substantially, between microbial genomes While some of this variation can be explained by evolutionary divergence and environmental factors, a notable portion is not understood. Microbial communities coinhabiting similar environments tend to have similar %GC regardless of taxa [5] Factors such as nitrogen abundance [6], AT-biased mutations due to loss of DNA repair genes [7], population density [8] and selective pressures [9,10,11] may explain some of the variance [2, 3, 12, 13] that spans from 13.5% GC in the intracellular symbiont Candidatus Zinderia insecticola to 75% GC in the soil bacterium Anaeromyxobacter dehalogens [14]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call