Abstract

Molecular evolutionary rate estimates have been shown to depend on the time period over which they are estimated. Factors such as demographic processes, calibration errors, purifying selection, and the heterogeneity of substitution rates among sites (RHAS) are known to affect the accuracy with which rates of evolution are estimated. We use mathematical modeling and Bayesian analyses of simulated sequence alignments to explore how mutational hotspots can lead to time-dependent rate estimates. Mathematical modeling shows that underestimation of molecular rates over increasing time scales is inevitable when RHAS is ignored. Although a gamma distribution is commonly used to model RHAS, we show that when the actual RHAS deviates from a gamma-like distribution, rates can either be under- or overestimated in a time-dependent manner. Simulations performed under different scenarios of RHAS confirm the mathematical modeling and demonstrate the impacts of time-dependent rates on estimates of divergence times. Most notably, erroneous rate estimates can have narrow credibility intervals, leading to false confidence in biased estimates of rates, and node ages. Surprisingly, large errors in estimates of overall molecular rate do not necessarily generate large errors in divergence time estimates. Finally, we illustrate the correlation between time-dependent rate patterns and differential saturation between quickly and slowly evolving sites. Our results suggest that data partitioning or simple nonparametric mixture models of RHAS significantly improve the accuracy with which node ages and substitution rates can be estimated.

Highlights

  • Large genetic datasets are becoming available to estimate the pattern and timing of evolutionary divergences among organisms

  • Since the molecular clock hypothesis was first proposed in the 1960s (Zuckerkandl and Pauling 1962), numerous concerns have been raised about the estimation of molecular rates (Kumar 2005; Pulquerio and Nichols 2007)

  • Our study confirms that rate heterogeneity among sites (RHAS) can lead to a time-dependent pattern of rate estimates

Read more

Summary

Introduction

Large genetic datasets are becoming available to estimate the pattern and timing of evolutionary divergences among organisms. For a given locus or taxon, rate estimates for short time periods (e.g., within species) are typically far higher than rate estimates for long time periods (e.g., between species and higher taxa) (Ho and Larson 2006). The use of the human–chimpanzee divergence date to calibrate estimates of the time-scale of human dispersals has been shown to produce underestimates of the molecular rate, leading to overestimates of the age of migration events (Ho and Endicott 2008; Henn et al 2009; Soares et al 2009). It is recommended that calibration points should be selected according to the time-scale of interest (Ho et al 2008) or that a correction be applied to estimates of mutation rates (Soares et al 2009; Gignoux et al 2011)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call