Abstract

Similarities and differences between amino acids define the rates at which they substitute for one another within protein sequences and the patterns by which these sequences form protein structures. However, there exist many ways to measure similarity, whether one considers the molecular attributes of individual amino acids, the roles that they play within proteins, or some nuanced contribution of each. One popular approach to representing these relationships is to divide the 20 amino acids of the standard genetic code into groups, thereby forming a simplified amino acid alphabet. Here, we develop a method to compare or combine different simplified alphabets, and apply it to 34 simplified alphabets from the scientific literature. We use this method to show that while different suggestions vary and agree in non-intuitive ways, they combine to reveal a consensus view of amino acid similarity that is clearly rooted in physico-chemistry.Electronic supplementary materialThe online version of this article (doi:10.1007/s00239-013-9565-0) contains supplementary material, which is available to authorized users.

Highlights

  • The relationships between the 20 amino acids of the standard genetic code are fundamentally important to the Electronic supplementary material The online version of this article contains supplementary material, which is available to authorized users

  • Similarity measurements which focus on the chemistry and physics of individual amino acid molecules (Mahler and Cordes 1966; Lehninger 1970; Dickerson and Geis 1983; Taylor 1986; Weathers et al 2004) are likely to be different from those which analyze the roles played by amino acid residues within protein sequences (e.g., Dayhoff et al 1978; Risler et al 1988; Riddle et al 1997; Murphy et al 2000; Etchebest et al 2007) because biology’s genetic code defines how many point mutations are required to interconvert two different amino acids during protein sequence evolution (Fitch 1966)

  • In order to investigate the variability between different simplified alphabets, we identified a comprehensive dataset consisting of 34 amino acid simplifications published within peer-reviewed literature and characterized them according to their method of derivation (Table 1)

Read more

Summary

Introduction

The relationships between the 20 amino acids of the standard genetic code are fundamentally important to the Electronic supplementary material The online version of this article (doi:10.1007/s00239-013-9565-0) contains supplementary material, which is available to authorized users.This range of practical uses has led amino acid similarity to be defined in many ways such that simplified alphabets may reflect the purpose for which they were constructed or the methods by which they were derived. Similarity measurements which focus on the chemistry and physics of individual amino acid molecules (Mahler and Cordes 1966; Lehninger 1970; Dickerson and Geis 1983; Taylor 1986; Weathers et al 2004) are likely to be different from those which analyze the roles played by amino acid residues within protein sequences (e.g., Dayhoff et al 1978; Risler et al 1988; Riddle et al 1997; Murphy et al 2000; Etchebest et al 2007) because biology’s genetic code defines how many point mutations are required to interconvert two different amino acids during protein sequence evolution (Fitch 1966). For example, pairs with itself to form disulfide bridges that stabilize protein structures (Haber and Anfinsen 1962), and proline’s unique structure causes natural selection to favor its use in proteins for interrupting alpha helices These behaviors are difficult to identify, without the benefit of hindsight, when considering only the chemical properties of a single amino acid molecule. Any specific effort to calculate an amino acid simplification scheme will contain some degree of experimental error

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.