Abstract

Oligonucleotide usage in archaeal and bacterial genomes can be linked to a number of properties, including codon usage (trinucleotides), DNA base-stacking energy (dinucleotides), and DNA structural conformation (di- to tetranucleotides). We wanted to assess the statistical information potential of different DNA ‘word-sizes’ and explore how oligonucleotide frequencies differ in coding and non-coding regions. In addition, we used oligonucleotide frequencies to investigate DNA composition and how DNA sequence patterns change within and between prokaryotic organisms. Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides. Oligonucleotide usage varied more within AT-rich and host-associated genomes than in GC-rich and free-living genomes, and this variation was mainly located in non-coding regions. Bias (selectional pressure) in tetranucleotide usage correlated with GC content, and coding regions were more biased than non-coding regions. Non-coding regions were also found to be approximately 5.5% more AT-rich than coding regions, on average, in the 402 chromosomes examined. Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes. GC-rich genomes were more similar and biased in terms of tetranucleotide usage in non-coding regions than AT-rich genomes. The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.

Highlights

  • Prokaryotic DNA can be considered as a long chain of overlapping oligonucleotides, and frequencies of differently sized oligonucleotides can reveal different properties and patterns of bacterial and archaeal genomes

  • Comparing observed to expected tetranucleotide usage (OUV), we found that coding regions are, in general, more biased than non-coding regions, and more homogeneous according to the OUD test

  • AT content increased in non-coding regions in both AT rich and GC rich genomes, it did not appear to be a consequence of tetranucleotide preference

Read more

Summary

Introduction

Prokaryotic DNA can be considered as a long chain of overlapping oligonucleotides, and frequencies of differently sized oligonucleotides can reveal different properties and patterns of bacterial and archaeal genomes. The structures of A, B and Z type DNA helices are largely determined by 11-, 10- and 12-mers, respectively [2,6] Another advantage of considering genomic DNA as a set of fixed-sized oligonucleotide frequencies is that bias and pattern preference, i.e. the randomness inherent or lack thereof, in the complete DNA sequence can be detected [3]. The examples above illustrate some of the properties that can be extracted from DNA sequences by examining oligonucleotide usage variance This motivated us to explore how oligonucleotide distributions change within and between prokaryotes in coding and non-coding regions, how biased oligonucleotide frequencies are, and whether any particular trends could be detected. We found that tetranucleotide frequencies carried considerable genomic information potential, and were used in

Author Summary
Materials and Methods
Methods
Findings
M X N zji ji 2 mx : ð6Þ
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call