Kullback Leibler divergence in complete bacterial and phage genomes.

Sajia Akhter,Robert A Edwards,Eslam S Ibrahim,Mona T Kashef,Ramy K Aziz,Barbara Bailey

doi:10.7717/peerj.4026

Abstract

The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback–Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses.

Highlights

The central dogma of molecular biology describes the irreversible flow of information in biological systems from nucleic acids to amino acids, whose combinations make up the main cellular components: proteins
We demonstrate that Kullback–Leibler divergence (KLD) correlates well with an organism’s phylogeny and amino acid utilization profile, in addition to correlating with the GC content of bacterial genomes
KLD was calculated for all predicted proteins encoded by 372 bacterial genomes and 835 phage genomes

Summary

Introduction

The central dogma of molecular biology describes the irreversible flow of information in biological systems from nucleic acids to amino acids, whose combinations make up the main cellular components: proteins. In principle, such flow of information is no different from other data storage and communication systems, and can be studied by the information theory (Shannon, 1948). Shannon’s index is increasingly being used as a bioinformatics tool to solve problems related to either network or genome context, e.g., comparative genomics, resolution-free metrics, motif classification, and sequence-independent correlations (De Domenico & Biamonte, 2016; Vinga, 2014). Von Neumann entropy, which originated from Shannon’s classical information theory, is used as a divergence parameter that could be implemented from spectral data to human microbiome networking (De Domenico & Biamonte, 2016)

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ	Publication Date: Nov 30, 2017
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Kullback Leibler divergence in complete bacterial and phage genomes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ

Lead the way for us

Similar Papers

Applying Shannon's information theory to bacterial and phage genomes and metagenomes
Sajia Akhter ... Robert A Edwards
Scientific Reports | VOL. 3
Sajia Akhter, et. al.Sajia Akhter ... Robert A Edwards
08 Jan 2013
Scientific Reports | VOL. 3

Optimized Illumina PCR-free library preparation for bacterial whole genome sequencing and analysis of factors influencing de novo assembly
Christopher Huptas ... Siegfried Scherer
BMC Research Notes | VOL. 9
Christopher Huptas, et. al.Christopher Huptas ... Siegfried Scherer
12 May 2016
BMC Research Notes | VOL. 9

Edaphic controls on genome size and GC content of bacteria in soil microbial communities
Peter F Chuckran ... Paul Dijkstra
Soil Biology and Biochemistry | VOL. 178
Peter F Chuckran, et. al.Peter F Chuckran ... Paul Dijkstra
27 Dec 2022
Soil Biology and Biochemistry | VOL. 178

Across Bacterial Phyla, Distantly-Related Genomes with Similar Genomic GC Content Have Similar Patterns of Amino Acid Usage
John Lightfield ... Bert Ely
PLoS ONE | VOL. 6
John Lightfield, et. al.John Lightfield ... Bert Ely
10 Mar 2011
PLoS ONE | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Kullback Leibler divergence in complete bacterial and phage genomes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ