Abstract

Site-specific evolutionary rates can be estimated from codon sequences or from amino-acid sequences. For codon sequences, the most popular methods use some variation of the dN∕dS ratio. For amino-acid sequences, one widely-used method is called Rate4Site, and it assigns a relative conservation score to each site in an alignment. How site-wise dN∕dS values relate to Rate4Site scores is not known. Here we elucidate the relationship between these two rate measurements. We simulate sequences with known dN∕dS, using either dN∕dS models or mutation–selection models for simulation. We then infer Rate4Site scores on the simulated alignments, and we compare those scores to either true or inferred dN∕dS values on the same alignments. We find that Rate4Site scores generally correlate well with true dN∕dS, and the correlation strengths increase in alignments with greater sequence divergence and more taxa. Moreover, Rate4Site scores correlate very well with inferred (as opposed to true) dN∕dS values, even for small alignments with little divergence. Finally, we verify this relationship between Rate4Site and dN∕dS in a variety of empirical datasets. We conclude that codon-level and amino-acid-level analysis frameworks are directly comparable and yield very similar inferences.

Highlights

  • Different sites in a protein evolve at different rates (Kimura & Ohta, 1974; Perutz, Kendrew & Watson, 1965), and these rate differences are shaped by the interplay of functional and structural constraints each site experiences (Echave, Spielman & Wilke, 2016)

  • We find that Rate4Site scores generally correlate well with dN /dS, in particular if both quantities are inferred from sequence data

  • For all empirical datasets, we inferred Rate4Site scores and per-site dN /dS values as described in the subsection ‘‘Rate inference.’’. Both Rate4Site scores and per-site dN /dS values are measures of the extent to which selection acts on individual protein sites

Read more

Summary

Introduction

Different sites in a protein evolve at different rates (Kimura & Ohta, 1974; Perutz, Kendrew & Watson, 1965), and these rate differences are shaped by the interplay of functional and structural constraints each site experiences (Echave, Spielman & Wilke, 2016). Analyses of sequence variation in a structural context frequently make use of site-specific evolutionary rate estimates, and a wide variety of different methods exist to infer such rates from either codon or amino-acid sequences (Nielsen & Yang, 1998; Yang & Nielsen, 2002; Kosakovsky Pond, Frost & Muse, 2005; Kosakovsky Pond & Muse, 2005; Yang et al, 2000; Murrell et al, 2012; Lemey et al, 2012; Pupko et al, 2002; Fernandes & Atchley, 2008; Huang & Golding, 2014; Huang & Golding, 2015; Mayrose et al, 2004). How dN /dS inference methods relate to Rate4Site scores is not known

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.