Abstract

Tree shape statistics provide valuable quantitative insights into evolutionary mechanisms underpinning phylogenetic trees, a commonly used graph representation of evolutionary relationships among taxonomic units ranging from viruses to species. We study two subtree counting statistics, the number of cherries and the number of pitchforks, for random phylogenetic trees generated by two widely used null tree models: the proportional to distinguishable arrangements (PDA) and the Yule-Harding-Kingman (YHK) models. By developing limit theorems for a version of extended Pólya urn models in which negative entries are permitted for their replacement matrices, we deduce the strong laws of large numbers and the central limit theorems for the joint distributions of these two counting statistics for the PDA and the YHK models. Our results indicate that the limiting behaviour of these two statistics, when appropriately scaled using the number of leaves in the underlying trees, is independent of the initial tree used in the tree generating process.

Highlights

  • As a common mathematical representation of evolutionary relationships among biological systems ranging from viruses to species, phylogenetic trees retain important signatures of the underlying evolutionary events and mechanisms which are often not directly observable, such as rates of speciation and expansion (Mooers et al 2007; Heath et al 2008)

  • This paper focuses on two subtree counting statistics: the number of cherries and that of pitchforks in a tree

  • The theorem below describes the asymptotic behaviour of β(Tn), which enables us to deduce the asymptotic properties of the joint distribution of the number of pitchforks and the number of cherries for the proportional to distinguishable arrangements (PDA) model in Corollary 2

Read more

Summary

Introduction

As a common mathematical representation of evolutionary relationships among biological systems ranging from viruses to species, phylogenetic trees retain important signatures of the underlying evolutionary events and mechanisms which are often not directly observable, such as rates of speciation and expansion (Mooers et al 2007; Heath et al 2008). The asymptotic frequency of cherries in pathogen trees generated by some models can be used to estimate the basic reproduction number (Plazzotta and Colijn 2016) and to study the impact of the underlying contact network over which a pathogen spreads (Metzig et al 2019) Various properties concerning these statistics have been established in the past decades on the following two fundamental random phylogenetic tree models: the Yule-Harding-Kingman (YHK) (Rosenberg 2006; Disanto and Wiehe 2013; Holmgren and Janson 2015) and the proportional to distinguishable arrangements (PDA) models (McKenzie and Steel 2000; Chang and Fuchs 2010; Wu and Choi 2016; Choi et al 2020). We conclude this paper in the last section with a discussion of our results and some open problems

Phylogenetic trees
40 Page 4 of 34
The YHK and the PDA processes
Modes of convergence
Miscellaneous
Urn models
40 Page 6 of 34
Limiting distributions under the YHK model
40 Page 10 of 34
Limiting distributions under the PDA model
Unrooted trees
Proofs of Theorems 1 and 2
40 Page 18 of 34
Proof of Theorem 1
40 Page 22 of 34
Proof of Theorem 2
40 Page 24 of 34
40 Page 26 of 34
Discussion
40 Page 28 of 34
40 Page 32 of 34
40 Page 34 of 34
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call