Abstract

Despite many attempts to introduce evolutionary models that permit substitutions to instantly alter more than one nucleotide in a codon, the prevailing wisdom remains that such changes are rare and generally negligible or are reflective of non-biological artifacts, such as alignment errors. Codon models continue to posit that only single nucleotide change have non-zero rates. Here, we develop and test a simple hierarchy of codon-substitution models with non-zero evolutionary rates for only one-nucleotide (1H), one- and two-nucleotide (2H), or any (3H) codon substitutions. Using over 42, 000 empirical alignments, we find widespread statistical support for multiple hits: 61% of alignments prefer models with 2H allowed, and 23%-with 3H allowed. Analyses of simulated data suggest that these results are not likely to be due to simple artifacts such as model misspecification or alignment errors. Further modeling reveals that synonymous codon island jumping among codons encoding serine, especially along short branches, contributes significantly to this 3H signal. While serine codons were prominently involved in multiple-hit substitutions, there were other common exchanges contributing to better model fit. It appears that a small subset of sites in most alignments have unusual evolutionary dynamics not well explained by existing model formalisms, and that commonly estimated quantities, such as dN/dS ratios may be biased by model misspecification. Our findings highlight the need for continued evaluation of assumptions underlying workhorse evolutionary models and subsequent evolutionary inference techniques. We provide a software implementation for evolutionary biologists to assess the potential impact of extra base hits in their data in the HyPhy package and in the Datamonkey.org server.

Highlights

  • Most modern codon models in widespread use assume that any changes within a codon happen as a sequence of single instantaneous nucleotide changes, enforced by setting instantaneous rates between codons that differ in more than one nucleotides to zero

  • The primary goals of our data analyses is to establish how often evidence for multiple hits can be detected in large-scale empirical databases, identify the codons that are frequently involved in such events, and explore plausible biological explanations for why these rates are non-zero for a majority of alignments

  • Substitutions involving serine codons, which are unique among the amino-acids in that they comprise two islands which are two or three nucleotide changes from each other, are prominent in driving statistical signal for these preferences, especially if they occur along short branches

Read more

Summary

Introduction

Most modern codon models in widespread use assume that any changes within a codon happen as a sequence of single instantaneous nucleotide changes, enforced by setting instantaneous rates between codons that differ in more than one nucleotides to zero. When Halpern and Bruno [3] introduced their mutation-selection models, they considered the general multi-hit (MH) case first, but largely abandoned it, noting that the single hit reduction “..has very little effect on our results under the conditions we have investigated.” This assumption is both computationally convenient and biologically sound in the majority of cases, since randomly occurring mutations “hitting” the same codon is a negligibly rare event. The primary goals of our data analyses is to establish how often evidence for multiple hits can be detected in large-scale empirical databases (something that no other study looking at evolutionary models has done), identify the codons that are frequently involved in such events, and explore plausible biological explanations for why these rates are non-zero for a majority of alignments

Materials and methods
Results
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.