Abstract

BackgroundThe observation of variation in substitution rates among lineages has led to (1) a general rejection of the molecular clock model, and (2) the suggestion that a number of biological characteristics of organisms can cause rate variation. Accurate estimates of rate variation, and thus accurate inferences regarding the causes of rate variation, depend on accurate estimates of substitution rates. However, theory suggests that even when the substitution process is clock-like, variable numbers of substitutions can occur among lineages because the substitution process is stochastic. Furthermore, substitution rates along lineages can be misestimated, particularly when multiple substitutions occur at some sites. Although these potential causes of error in rate estimation are well understood in theory, such error has not been examined in detail; consequently, empirical studies that estimate rate variation among lineages have been unable to determine whether their results could be impacted by estimation error.Methodology/Principal FindingsTo evaluate the extent to which error in rate estimation could erroneously suggest rate variation among lineages, we examined rate variation estimated for datasets simulated under a molecular clock on trees with equal and variable branch lengths. Thus, any apparent rate variation in these datasets reflects error in rate estimation rather than true differences in the underlying substitution process. We observed substantial rate variation among lineages in our simulations; however, we did not observe rate variation when average substitution rates were compared between different clades.Conclusions/SignificanceOur results confirm previous theoretical work suggesting that observations of among lineage rate variation in empirical data may be due to the stochastic substitution process and error in the estimation of substitution rates, rather than true differences in the underlying substitution process among lineages. However, conclusions regarding rate variation drawn from rates averaged across multiple branches are likely due to real, systematic variation in rates between groups.

Highlights

  • There is significant interest in estimating rates of gene evolution [e.g. 1,2,3,4,5] and differences in such rates among species, clades, and over time [e.g. 6,7,8]

  • We first evaluated rate variation estimates for datasets simulated on 8-taxon trees with equal-length branches ranging from 0.01 to 1.4 substitutions/site and rates estimated in a Bayesian framework

  • Rate variation was measured as maximum/minimum estimated rate; to evaluate the causes of observed rate variation, we compared this result to the variation expected due to the stochastic substitution process

Read more

Summary

Introduction

There is significant interest in estimating rates of gene evolution [e.g. 1,2,3,4,5] and differences in such rates among species, clades, and over time [e.g. 6,7,8]. It has since been suggested that genes rarely evolve according to a clock model, with significant variation in substitution rates even among closely related species [e.g. 6,11,12]. The observation of variation in substitution rates among lineages has led to (1) a general rejection of the molecular clock model, and (2) the suggestion that a number of biological characteristics of organisms can cause rate variation. Substitution rates along lineages can be misestimated, when multiple substitutions occur at some sites. These potential causes of error in rate estimation are well understood in theory, such error has not been examined in detail; empirical studies that estimate rate variation among lineages have been unable to determine whether their results could be impacted by estimation error

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call