Abstract
BackgroundRapid accumulation of vertebrate genome sequences render comparative genomics a powerful approach to study macro-evolutionary events. The assessment of phylogenic relationships between species routinely depends on the analysis of sequence homology at the nucleotide or protein level.ResultsWe analyzed mRNA GC content, codon usage and divergence of orthologous proteins in 55 vertebrate genomes. Data were visualized in genome-wide landscapes using a sliding window approach. Landscapes of GC content reveal both evolutionary conservation of clustered genes, and lineage-specific changes, so that it was possible to construct a phylogenetic tree that closely matched the classic “tree of life”. Landscapes of GC content also strongly correlated to landscapes of amino acid usage: positive correlation with glycine, alanine, arginine and proline and negative correlation with phenylalanine, tyrosine, methionine, isoleucine, asparagine and lysine. Peaks of GC content correlated strongly with increased protein divergence.ConclusionsLandscapes of base- and amino acid composition of the coding genome opens a new approach in comparative genomics, allowing identification of discrete regions in which protein evolution accelerated over deep evolutionary time. Insight in the evolution of genome structure may spur novel studies assessing the evolutionary benefit of genes in particular genomic regions.
Highlights
Rapid accumulation of vertebrate genome sequences render comparative genomics a powerful approach to study macro-evolutionary events
Construction of genome-wide exome landscapes of GC content We performed a comparative analysis among 55 different vertebrates
We verified the minor impact of reducing the total number of 19, 971 human protein-encoding genes to a core common vertebrate set of 15,824 protein encoding genes (Additional file 4: Figure S3)
Summary
Rapid accumulation of vertebrate genome sequences render comparative genomics a powerful approach to study macro-evolutionary events. Base composition of vertebrate genomes fluctuates in function of the regional position in the different chromosomes [1]. The change in GC content in continuous DNA sequence, without discrimination between introns, exons, intergenic sequence, is called isochores [2, 3]. The function of this structural variability has been debated since decades. Recombination hot spots accumulating GC bases are mostly seen at the subtelomeric regions of chromosomes [8]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have