Abstract

BackgroundProtein evolution is particularly shaped by the conservation of the amino acids' physico-chemical properties and the structure of the genetic code. While conservation is the result of negative selection against proteins with reduced functionality, the codon sequences determine the stochastic aspect of amino acid exchanges. Thus far, it is known that the genetic code is the dominant factor if little time has elapsed since the divergence of one gene into two, but physico-chemical forces gain importance at greater evolutionary distances. Further details, however, on how the influence of these factors varies with time are unknown to date.Methodology/Principal FindingsHere, we derive each 10,000 divergence specific substitution matrices for orthologues and paralogues from the Pfam collection of multiple protein alignments and quantify the action of three physico-chemical forces and of the structure of the genetic code at high resolution using correlation analysis. For closely related proteins, the codon sequence similarity is the most influential factor controlling protein evolution, but its influence decreases rapidly as divergence grows. From a protein sequence divergence of about 20 percent on the maintenance of the hydrophobic character of an amino acid is the most influential factor. All factors lose importance from about 40 percent divergence on. This suggests that the original protein structure often does no longer represent a constraint to the protein sequence. The proteins then become free to adopt new functions. We furthermore show that the constraints exerted by both physico-chemical forces and by the genetic code are quite comparable for orthologues and paralogues, however somewhat weaker for paralogues than for orthologues in weakly or moderately diverged proteins.Conclusion/SignificanceOur analysis substantiates earlier findings that protein evolution is mainly governed by the structure of the genetic code in the early phase after divergence and by the conservation of physico-chemical properties at the later phase. We determine the level of sequence divergence from which on the conservation of the hydrophobic character is gaining importance over the genetic code to be 20 percent. The evolution of orthologues and paralogues is shaped by evolutionary forces in quite comparable ways.

Highlights

  • The evolution of proteins can be seen as a succession of replacements of amino acids by other amino acids

  • We show that the constraints exerted by both physicochemical forces and by the genetic code are quite comparable for orthologues and paralogues, somewhat weaker for paralogues than for orthologues in weakly or moderately diverged proteins

  • The sequence divergence is caused by replacements of amino acids by other amino acids

Read more

Summary

Introduction

The evolution of proteins can be seen as a succession of replacements of amino acids by other amino acids. In order to quantify the rates by which amino acids are replaced by other amino acids so-called substitution matrices (or exchange matrices) are built from multiple sequence alignments of homologous proteins [1]. Substitution matrices are of particular importance for sequence data base searches with protein or DNA sequences of unknown function. Many attempts were made to refine them [2,3,4] Such a substitution matrix is, strictly speaking, specific for the protein it is derived from because not all positions in a protein are of equal importance. On how the influence of these factors varies with time are unknown to date

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call