Abstract

Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes.

Highlights

  • Modern molecular evolution has benefited greatly from the development of a sound probabilistic framework for modeling the evolution of homologous gene sequences [1]

  • Codon substitution models [2,3] have facilitated the estimation of the ratio of non-synonymous to synonymous substitution rates, which can be interpreted as an indicator of the strength and type of natural selection

  • A serious limitation of most codon models is the unrealistic assumption that all non-synonymous substitutions occur at the same rate

Read more

Summary

Introduction

Modern molecular evolution has benefited greatly from the development of a sound probabilistic framework for modeling the evolution of homologous gene sequences [1]. Codon substitution models [2,3] have facilitated the estimation of the ratio of non-synonymous to synonymous substitution rates (referred to as dN=dS, Ka=Ks, v), which can be interpreted as an indicator of the strength and type of natural selection (see [4] or [5] for recent reviews). In most subsequent applications of codon models, all one-nucleotide substitutions were stratified into synonymous (rate a, using the notation of [2]) and non-synonymous (rate b) classes. Most protein substitution models are derived by estimating the relative rates of amino-acid substitutions in large protein databases [6,7,8], and consistently report dramatic differences in the relative replacement rates of different residues

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call