Abstract

The arrangement of nucleotides within a bacterial chromosome is influenced by numerous factors. The degeneracy of the third codon within each reading frame allows some flexibility of nucleotide selection; however, the third nucleotide in the triplet of each codon is at least partly determined by the preceding two. This is most evident in organisms with a strong G + C bias, as the degenerate codon must contribute disproportionately to maintaining that bias. Therefore, a correlation exists between the first two nucleotides and the third in all open reading frames. If the arrangement of nucleotides in a bacterial chromosome is represented as a Markov process, we would expect that the correlation would be completely captured by a second-order Markov model and an increase in the order of the model (e.g., third-, fourth-…order) would not capture any additional uncertainty in the process. In this manuscript, we present the results of a comprehensive study of the Markov property that exists in the DNA sequences of 906 bacterial chromosomes. All of the 906 bacterial chromosomes studied exhibit a statistically significant Markov property that extends beyond second-order, and therefore cannot be fully explained by codon usage. An unrooted tree containing all 906 bacterial chromosomes based on their transition probability matrices of third-order shares ∼25% similarity to a tree based on sequence homologies of 16S rRNA sequences. This congruence to the 16S rRNA tree is greater than for trees based on lower-order models (e.g., second-order), and higher-order models result in diminishing improvements in congruence. A nucleotide correlation most likely exists within every bacterial chromosome that extends past three nucleotides. This correlation places significant limits on the number of nucleotide sequences that can represent probable bacterial chromosomes. Transition matrix usage is largely conserved by taxa, indicating that this property is likely inherited, however some important exceptions exist that may indicate the convergent evolution of some bacteria.

Highlights

  • For more than twenty years, the nucleotide composition of bacterial genomes has been the focus of many studies attempting to identify patterns in nucleic acid sequences

  • They discovered that correlations exist between neighboring nucleotides in bacteria, and that dinucleotide frequencies can be used as a genomic signature which may result from: (1) the chemistry of dinucleotide stacking; (2) DNA conformational tendencies; (3) species-specific properties of DNA replication and repair mechanisms; (4) the selection of restriction endonucleases (Karlin, Campbell & Mrazek, 1998); and (5) codon usage, as it effects translational efficiency (Gouy & Gautier, 1982; Grantham et al, 1981; Sharp et al, 1993)

  • These and other pioneering studies were narrow in scope because, at that time, available data was limited to single gene sequences, partial chromosomes, and the complete genomes of a small number of model organisms, such as Escherichia coli K-12 (Blattner et al, 1997), Haemophilus influenzae (Fleischmann et al, 1995) and Bacillus subtilis (Kunst et al, 1997)

Read more

Summary

Introduction

For more than twenty years, the nucleotide composition of bacterial genomes has been the focus of many studies attempting to identify patterns in nucleic acid sequences. They discovered that correlations exist between neighboring nucleotides (dinucleotides) in bacteria, and that dinucleotide frequencies can be used as a genomic signature which may result from: (1) the chemistry of dinucleotide stacking; (2) DNA conformational tendencies; (3) species-specific properties of DNA replication and repair mechanisms; (4) the selection of restriction endonucleases (Karlin, Campbell & Mrazek, 1998); and (5) codon usage, as it effects translational efficiency (Gouy & Gautier, 1982; Grantham et al, 1981; Sharp et al, 1993) These and other pioneering studies were narrow in scope because, at that time, available data was limited to single gene sequences, partial chromosomes, and the complete genomes of a small number of model organisms, such as Escherichia coli K-12 (Blattner et al, 1997), Haemophilus influenzae (Fleischmann et al, 1995) and Bacillus subtilis (Kunst et al, 1997). We can begin to identify sequence features that may constrain most bacterial genomes, and thereby describe a set of heuristics that may eventually help define the statistical boundaries of what constitutes a bacterium

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.