Abstract

Phylogenetic analyses of DNA sequence data can provide estimates of evolutionary rates and timescales. Nearly all phylogenetic methods rely on accurate models of nucleotide substitution. A key feature of molecular evolution is the heterogeneity of substitution rates among sites, which is often modelled using a discrete gamma distribution. A widely used derivative of this is the gamma-invariable mixture model, which assumes that a proportion of sites in the sequence are completely resistant to change, while substitution rates at the remaining sites are gamma-distributed. For data sampled at the intraspecific level, however, biological assumptions involved in the invariable-sites model are commonly violated. We examined the use of these models in analyses of five intraspecific data sets. We show that using 6–10 rate categories for the discrete gamma distribution of rates among sites is sufficient to provide a good approximation of the marginal likelihood. Increasing the number of gamma rate categories did not have a substantial effect on estimates of the substitution rate or coalescence time, unless rates varied strongly among sites in a non-gamma-distributed manner. The assumption of a proportion of invariable sites provided a better approximation of the asymptotic marginal likelihood when the number of gamma categories was small, but had minimal impact on estimates of rates and coalescence times. However, the estimated proportion of invariable sites was highly susceptible to changes in the number of gamma rate categories. The concurrent use of gamma and invariable-site models for intraspecific data is not biologically meaningful and has been challenged on statistical grounds; here we have found that the assumption of a proportion of invariable sites has no obvious impact on Bayesian estimates of rates and timescales from intraspecific data.

Highlights

  • In phylogenetic analyses of DNA sequence data, the evolutionary process is usually described using models of nucleotide substitution

  • Neither model provided a good approximation of the pattern of rate heterogeneity among sites (RHAS) in the two human mitogenome data sets (i and ii), which were the largest and least variable

  • For population-level analyses that aim to estimate substitution rates or coalescence times, our results suggest that greatly increasing the number of gamma categories typically does not lead to substantial changes in parameter estimates

Read more

Summary

Introduction

In phylogenetic analyses of DNA sequence data, the evolutionary process is usually described using models of nucleotide substitution. These models commonly assume that substitutions occurring at each site are described by a Markov chain and that different sites evolve in a mutually independent manner. By constraining one or more of these parameters, a family of time-reversible models can be generated. This gives rise to special cases of the GTR model, such as the Jukes–Cantor [2], Kimura [3], and Hasegawa– Kishino–Yano [4] models

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.