Abstract

Diverse phylogenetic methods require a substitution model of evolution that should mimic, as accurately as possible, the real substitution process. At the protein level, empirical substitution models have traditionally been based on a large number of different proteins from particular taxonomic levels. However, these models assume that all of the proteins of a taxonomic level evolve under the same substitution patterns. We believe that this assumption is highly unrealistic and should be relaxed by considering protein-specific substitution models that account for protein-specific selection processes. In order to test this hypothesis, we inferred and evaluated four new empirical substitution models for the protease and integrase of HIV and other viruses. We found that these models more accurately fit, compared with any of the currently available empirical substitution models, the evolutionary process of these proteins. We conclude that evolutionary inferences from protein sequences are more accurate if they are based on protein-specific substitution models rather than taxonomic-specific (generalist) substitution models. We also present four new empirical substitution models of protein evolution that could be useful for phylogenetic inferences of viral protease and integrase.

Highlights

  • IntroductionAcademic Editor: Domenico LioSubstitution models of molecular evolution are well established in a variety of phylogenetic methods to obtain accurate inferences of past evolutionary processes [1]

  • Academic Editor: Domenico LioSubstitution models of molecular evolution are well established in a variety of phylogenetic methods to obtain accurate inferences of past evolutionary processes [1]

  • Among the currently available empirical substitution models, the HIVb substitution model produced the best fitting with all of the test datasets, except for the viral IN test dataset, for which the selected model was WAG

Read more

Summary

Introduction

Academic Editor: Domenico LioSubstitution models of molecular evolution are well established in a variety of phylogenetic methods to obtain accurate inferences of past evolutionary processes [1]. There are parametric or structure-based substitution models that consider structural constraints to model selection on the protein folding stability and function [9,10,11,12,13] These models provided accurate inferences of protein evolution [10,12]; some of them have been implemented in useful evolutionary frameworks [14,15], their mathematical complexity (i.e., most of them account for site-dependent evolution) and large computational requirements prevented (for the moment) their establishment in phylogenetics. The other category includes empirical substitution models of protein evolution [1,8,16] These substitution models consist of a 20 × 20 matrix of relative rates of change among amino acids (hereafter, an exchangeability matrix) and 20 amino acid frequencies, which are estimated from large protein databases. These models assume that all of the protein sites evolve under the same substitution process, despite the fact that this is often unrealistic

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call