Abstract
Substitutions between chemically distant amino acids are known to occur less frequently than those between more similar amino acids. This knowledge, however, is not reflected in most codon substitution models, which treat all nonsynonymous changes as if they were equivalent in terms of impact on the protein. A variety of methods for integrating chemical distances into models have been proposed, with a common approach being to divide substitutions into radical or conservative categories. Nevertheless, it remains unclear whether the resulting models describe sequence evolution better than their simpler counterparts. We propose a parametric codon model that distinguishes between radical and conservative substitutions, allowing us to assess if radical substitutions are preferentially removed by selection. Applying our new model to a range of phylogenomic data, we find differentiating between radical and conservative substitutions provides significantly better fit for large populations, but see no equivalent improvement for smaller populations. Comparing codon and amino acid models using these same data shows that alignments from large populations tend to select phylogenetic models containing information about amino acid exchangeabilities, whereas the structure of the genetic code is more important for smaller populations. Our results suggest selection against radical substitutions is, on average, more pronounced in large populations than smaller ones. The reduced observable effect of selection in smaller populations may be due to stronger genetic drift making it more challenging to detect preferences. Our results imply an important connection between the life history of a phylogenetic group and the model that best describes its evolution.
Highlights
Quantifying the impact of natural selection on proteins is of broad interest in evolutionary biology, providing insight into the structural and functional constraints acting on proteins and how they adapt to an organism’s environment
To examine the relative selective pressures acting on different types of amino acid substitutions, we propose a codon model that separates nonsynonymous substitutions into conservative or radical (CoRa) categories
The CoRa Substitution Model The base model for describing codon substitutions is the M0 model (Goldman and Yang 1994), which captures the relative selective pressures acting on nonsynonymous substitutions through the x 1⁄4 dN=dS parameter
Summary
Quantifying the impact of natural selection on proteins is of broad interest in evolutionary biology, providing insight into the structural and functional constraints acting on proteins and how they adapt to an organism’s environment. The most widely used method for studying selection using multiple sequence alignments of protein-coding sequences is to consider the ratio of the nonsynonymous substitution rate (dN) to the synonymous substitution rate (dS), often referred to as x 1⁄4 dN=dS. These codon-based models assume that dS reflects the neutral rate of evolution and dN represents the rate after selection has acted. The original codon models used a dN/dS measure that incorporated these distances (Goldman and Yang 1994), based on the rationale that selection against more similar amino acid substitutions ought to be weaker than against more distant ones. Subsequent research, found that this model frequently provided a poorer fit than the simpler M0 model, which estimates dN/ dS but does not capture differences in the selective pressures acting on different amino acid substitutions (Yang et al 1998)
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have