Abstract

Phylogenetic dating is one of the most powerful and commonly used methods of drawing epidemiological interpretations from pathogen genomic data. Building such trees requires considering a molecular clock model which represents the rate at which substitutions accumulate on genomes. When the molecular clock rate is constant throughout the tree then the clock is said to be strict, but this is often not an acceptable assumption. Alternatively, relaxed clock models consider variations in the clock rate, often based on a distribution of rates for each branch. However, we show here that the distributions of rates across branches in commonly used relaxed clock models are incompatible with the biological expectation that the sum of the numbers of substitutions on two neighboring branches should be distributed as the substitution number on a single branch of equivalent length. We call this expectation the additivity property. We further show how assumptions of commonly used relaxed clock models can lead to estimates of evolutionary rates and dates with low precision and biased confidence intervals. We therefore propose a new additive relaxed clock model where the additivity property is satisfied. We illustrate the use of our new additive relaxed clock model on a range of simulated and real data sets, and we show that using this new model leads to more accurate estimates of mean evolutionary rates and ancestral dates.

Highlights

  • Epidemiological analysis of pathogen genomic data often relies on the construction and interpretation of dated phylogenies

  • We found that the additive RC (ARC) had significantly better fit for all simulations with x > 1, which is as expected because the data were simulated from the ARC model

  • We showed that the existing SC models for both discrete (SC, eq 1) and continuous cases satisfy this additivity property, whereas commonly used uncorrelated relaxed clock (RC) models (RC, eq 3 and cRC, eq 14) do not

Read more

Summary

Introduction

Epidemiological analysis of pathogen genomic data often relies on the construction and interpretation of dated phylogenies. Time-scaled phylogenetic analysis represents a very useful and popular tool for genomic epidemiology, allowing researchers to study population size dynamics (Ho and Shapiro 2011), transmission (Didelot et al 2017), pathogen population structure (Volz et al 2020), or host population structure (Volz et al 2013). The second step (phylogeny dating) can be performed, for example, using LSD (To et al 2016), node.dating (Jones and Poon 2017), treedater (Volz and Frost 2017), TreeTime (Sagulenko et al 2018), or BactDating (Didelot et al 2018). For ease of presentation, we initially focus on the two-step phylogeny dating approach, and later show how our findings are applicable to the integrated approach too

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.