Abstract
The phylogenetic analysis of proteins conventionally relies on the evaluation of amino acid sequences or coding sequences. Individual amino acids have measurable features that allow the translation from strings of letters (amino acids or bases) into strings of numbers (physico-chemical properties). When the letters are converted to measurable properties, such numerical strings can be evaluated quantitatively with various tools of complex systems research. We build on our prior phylogenetic analysis of the cytokine Osteopontin to validate the quantitative approach toward the study of protein evolution. Phylogenetic trees constructed from the number strings differentiate among all sequences. In pairwise comparisons, autocorrelation, average mutual information and box counting dimension yield one number each for the overall relatedness between sequences. We also find that bivariate wavelet analysis distinguishes hypermutable regions from conserved regions of the protein. The investigation of protein evolution via quantitative study of the physico-chemical characteristics pertaining to the amino acid building blocks broadens the spectrum of applicable research tools, accounts for mutation as well as selection, gives assess to multiple vistas depending on the property evaluated, discriminates more accurately among sequences, and renders the analysis more quantitative than utilizing strings of letters as starting points.
Highlights
The analysis of amino acid sequences or coding sequences for proteins is central to the study of molecular evolution
For each taxon under investigation, the consensus Osteopontin sequence was generated by choosing the most common amino acid for every polymorphic site, and the sequences were aligned with Clustal Omega (Supplementary Table 1)
Accounting for gaps often is problematic [in a prior phylogenetic analysis of Osteopontin using a large set of sequences (Weber, 2018), the exclusion of gaps caused a substantial reduction in the number of residues that contributed to the calculations]
Summary
The analysis of amino acid sequences or coding sequences for proteins is central to the study of molecular evolution. Among the main tools is the construction of phylogenetic trees, which are assembled based on algorithms that consider the numbers of mismatched amino acids or bases between aligned sequences (Charleston, 2013; He, 2019). In this strategy, multiple comparisons may be characterized by identical numbers of differences and are placed on the same evolutionary level. The rationale for basing evolutionary studies of proteins on their differences in amino acid sequences is rooted in the mechanism of mutation, which alters bases in the coding sequence, consecutively resulting in alterations of individual amino acids. Selection–the second major driver of evolution–favors certain mutations over others, and this preference is not captured
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.