Abstract
BackgroundThe purpose of the present study was to examine the GC content of substituted bases (sbGC) in the core genomes of 35 bacterial species. Each species, or core genome, constituted genomes from at least 10 strains. We also wanted to explore whether sbGC for each strain was associated with the corresponding species’ core genome GC content (cgGC). We present a simple mathematical model that estimates sbGC from cgGC. The model assumes only that the estimated sbGC is a function of cgGC proportional to fixed AT→GC (α) and GC → AT (β) mutation rates. Non-linear regression was used to estimate parameters α and β from the empirical data described above.ResultsWe found that sbGC for each strain showed a non-linear association with the corresponding cgGC with a bias towards higher GC content for most core genomes (66.3% of the strains), assuming as a null-hypothesis that sbGC should be approximately equal to cgGC. The most GC rich core genomes (i.e. approximately %GC > 60), on the other hand, exhibited slightly less GC-biased sbGC than expected. The best fitted regression model indicates that GC → AT mutation rates β = (1.91 ± 0.13) p < 0.001 are approximately (1.91/0.79) = 2.42 times as high, on average, as AT→GC α = (− 0.79 ± 0.25) p < 0.001 mutation rates. Whether the observed sbGC GC-bias for all but the most GC-rich prokaryotic species is due to selection, compensating for the GC → AT mutation bias, and/or selective neutral processes is currently debated. Residual standard error was found to be σ = 0.076 indicating estimated errors of sbGC to be approximately within ±15.2% GC (95% confidence interval) for the strains of all species in the study.ConclusionNot only did our mathematical model give reasonable estimates of sbGC it also provides further support to previous observations that mutation rates in prokaryotes exhibit a universal GC → AT bias that appears to be remarkably consistent between taxa.
Highlights
The purpose of the present study was to examine the GC content of substituted bases in the core genomes of 35 bacterial species
Additional files 3 and 4 contains more information regarding the species and data used in the present study. gcMOD allows for sbGC predictions to be performed for each species, in the sense that each sbGC and core genome GC content (cgGC) value is based on all strains and all core genomes for each species, if the parameters α and β are reestimated
The model only takes core genome GC content as an independent variable and return estimates of sbGC. gcMOD sbGC predictions are based on parameter estimates of GC → AT and AT→GC substitutions (β and α, respectively) obtained using non-linear regression on empirical data
Summary
The purpose of the present study was to examine the GC content of substituted bases (sbGC) in the core genomes of 35 bacterial species. GC content in bacterial genomes varies greatly from, for instance, 13.5% in the intracellular symbiont Candidatus Zinderia insecticola [1] to more than 75% in the soil dwelling Anaeromyxobacter dehalogenans [2] This variance in base composition has been found to be driven by phylogeny [3], environment [4], selection [5] and selective neutral processes [6, 7] as well as drift due to a general AT mutation bias [8,9,10]. The purpose of the present study was to explore the GC content of the substituted bases (sbGC) in core genomes of strains of diverse microbial species. The parameters α and β were subsequently estimated using non-linear regression from the empirical data described above and gcMOD
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.