Abstract

BackgroundDNA word frequencies, normalized for genomic AT content, are remarkably stable within prokaryotic genomes and are therefore said to reflect a “genomic signature.” The genomic signatures can be used to phylogenetically classify organisms from arbitrary sampled DNA. Genomic signatures can also be used to search for horizontally transferred DNA or DNA regions subjected to special selection forces. Thus, the stability of the genomic signature can be used as a measure of genomic homogeneity. The factors associated with the stability of the genomic signatures are not known, and this motivated us to investigate further. We analyzed the intra-genomic variance of genomic signatures based on AT content normalization (0th order Markov model) as well as genomic signatures normalized by smaller DNA words (1st and 2nd order Markov models) for 636 sequenced prokaryotic genomes. Regression models were fitted, with intra-genomic signature variance as the response variable, to a set of factors representing genomic properties such as genomic AT content, genome size, habitat, phylum, oxygen requirement, optimal growth temperature and oligonucleotide usage variance (OUV, a measure of oligonucleotide usage bias), measured as the variance between genomic tetranucleotide frequencies and Markov chain approximated tetranucleotide frequencies, as predictors.Principal FindingsRegression analysis revealed that OUV was the most important factor (p<0.001) determining intra-genomic homogeneity as measured using genomic signatures. This means that the less random the oligonucleotide usage is in the sense of higher OUV, the more homogeneous the genome is in terms of the genomic signature. The other factors influencing variance in the genomic signature (p<0.001) were genomic AT content, phylum and oxygen requirement.ConclusionsGenomic homogeneity in prokaryotes is intimately linked to genomic GC content, oligonucleotide usage bias (OUV) and aerobiosis, while oligonucleotide usage bias (OUV) is associated with genomic GC content, aerobiosis and habitat.

Highlights

  • Analyses of the DNA composition in prokaryotes and eukaryotes have revealed important differences

  • Oxygen requirement was associated with genomic base composition as measured by the oligonucleotide usage variance (OUV) measure (p,0.001) for both first order Markov chain (FOM) and second order Markov chain (SOM) models

  • The results from the regression model indicate that aerobic organisms have a more biased genome compared to the FOM and SOM based random walk models

Read more

Summary

Introduction

Analyses of the DNA composition in prokaryotes and eukaryotes have revealed important differences. Small scale genomic DNA (i.e. genetic sections covering 200 bp) has a Brownian motion, or random walk reminiscent composition, in other words, the long-range correlation effects described above for eukaryotes are absent in microbial genomes [3]. Markov chains can be extended to be made dependent on additional events, or time steps, allowing for short range correlation effects, i.e. short term memory, in the random walk process [4]. Regression models were fitted, with intra-genomic signature variance as the response variable, to a set of factors representing genomic properties such as genomic AT content, genome size, habitat, phylum, oxygen requirement, optimal growth temperature and oligonucleotide usage variance (OUV, a measure of oligonucleotide usage bias), measured as the variance between genomic tetranucleotide frequencies and Markov chain approximated tetranucleotide frequencies, as predictors

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call