Abstract

BackgroundHarnessing vast amounts of genomic data in phylogenetic context stemming from massive sequencing of multiple closely related genomes requires new tools and approaches. We present a tool for the genome-wide analysis of frequencies and patterns of amino acid substitutions in multiple alignments of genes’ coding regions, and a database of amino acid substitutions in the phylogeny of 12 Drosophila genomes. We illustrate the use of these resources to address three types of evolutionary genomics questions: about fluxes in amino acid composition in proteins, about asymmetries in amino acid substitutions and about patterns of molecular evolution in duplicated genes.ResultsWe demonstrate that amino acid composition of Drosophila proteins underwent a significant shift over the last 70 million years encompassed by the studied phylogeny, with less common amino acids (Cys, Met, His) increasing in frequency and more common ones (Ala, Leu, Glu) becoming less frequent. These fluxes are strongly correlated with polarity of source and destination amino acids, resulting in overall systematic decrease of mean polarity of amino acids found in Drosophila proteins. Frequency and radicality of amino acid substitutions are higher in paralogs than in orthologous single-copy genes and are higher in gene families with paralogs than in gene families without surviving duplications. Rate and radicality of substitutions, as expected, are negatively correlated with overall level and uniformity of gene expression. However, these correlations are not observed for substitutions occurring in duplicated genes, indicating a different selective constraint on the evolution of paralogous sequences. Clades resulting from duplications show a marked asymmetry in rate and radicality of amino acid substitutions, possibly a signal of widespread neofunctionalization. These patterns differ among protein families of different functionality, with genes coding for RNA-binding proteins differing from most other functional groups in terms of amino acid substitution patterns in duplicated and single-copy genes.ConclusionsWe demonstrate that deep phylogenetic analysis of amino acid substitutions can reveal interesting genome-wide patterns. Amino acid composition of drosophilid proteins is shaped by fluxes similar to those previously observed in prokaryotic, yeast and mammalian genomes, indicating globally present patterns. Increased frequency and radicality of amino acid substitutions in duplicated genes and the presence of asymmetry of these parameters between paralogous clades indicate widespread neofunctionalization among paralogs as the mechanism of duplication retention.

Highlights

  • Harnessing vast amounts of genomic data in phylogenetic context stemming from massive sequencing of multiple closely related genomes requires new tools and approaches

  • The raw data for the analysis reported here was a set of multiple amino acids alignments from 12 completely sequenced Drosophila genomes ([4,19]; see Methods)

  • Systematic loss/gain asymmetry in amino acid composition in 12 Drosophila genomes corroborates patterns previously observed in a variety of taxonomically diverse triplets of genomes [3]. This pattern does not become less pronounced as more and more distant genomes are included into consideration, indicating that it is not caused by the effect of polymorphisms reflecting mutation-selection balance influenced by mutational asymmetries [7,9]. We demonstrate that this net loss/gain asymmetry is strongly correlated with source and destination amino acid polarities: substitutions of polar amino acids by non-polar ones have a higher net rate than the reciprocal substitutions

Read more

Summary

Introduction

Harnessing vast amounts of genomic data in phylogenetic context stemming from massive sequencing of multiple closely related genomes requires new tools and approaches. As the strategy of genome sequencing shifts from broad taxonomic coverage to sequencing multiple closely related genomes [4], a need arises in a set of tools to accomplish a phylogenetic analysis of amino acid substitutions in coding portions of a large number of protein families simultaneously and to address the question of generality of patterns observed in limited and possibly biased set of select gene families. There may be substantial differences in selection pressures on reciprocal amino acid substitutions [10]: changes from polar to non-polar amino acids in human proteins are more permissive than vice-versa Such asymmetry and the degree to which is can contribute to the large-scale changes in amino acid composition has not yet been measured on the scale of several genomes

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call