Phylogenetic tree statistics: A systematic overview using the new R package ‘treestats’

Thijs Janzen,Rampal S Etienne

doi:10.1016/j.ympev.2024.108168

Abstract

Phylogenetic trees are believed to contain a wealth of information on diversification processes. However, comparing phylogenetic trees is not straightforward due to their high dimensionality. Researchers have therefore defined a wide range of low-dimensional summary statistics. Currently, it remains unexplored to what extent these summary statistics cover the same underlying information and what summary statistics best explain observed variation across phylogenies. Furthermore, a large subset of available summary statistics focusses on measuring the topological features of a phylogenetic tree, but are often only explored at the extreme edge cases of the fully balanced or imbalanced tree and not for trees of intermediate balance.Here, we introduce a new R package called ‘treestats’, that provides speed optimized code to compute 70 summary statistics. We study correlations between summary statistics on empirical trees and on trees simulated using several diversification models. Furthermore, we introduce an algorithm to create intermediately balanced trees in a well-defined manner, in order to explore variation in summary statistics across a balance gradient.We find that almost all summary statistics are correlated with tree size, and find that it is difficult, if not impossible, to correct for tree size, unless the tree generating model is known. Furthermore, we find that across empirical and simulated trees, at least three large clusters of correlated summary statistics can be found, where statistics group together based on information used (topology or branching times). However, the finer grained correlation structure appears to depend strongly on either the taxonomic group studied (in empirical studies) or the tree generating model (in simulation studies).Amongst statistics describing the (im)balance of a tree, we find that almost all statistics vary non-linearly, and sometimes even non-monotonically, with our generated balance gradient. This indicates that balance is perhaps a more complex property of a tree than previously thought. Furthermore, using our new imbalancing algorithm, we devise a numerical test to identify balance statistics, and identify several statistics as balance statistics that were not previously considered as such. Lastly, our results lead to several recommendations on which statistics to select when analyzing and comparing phylogenetic trees.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Phylogenetic tree statistics: A systematic overview using the new R package ‘treestats’

Abstract

Talk to us

Similar Papers

More From: Molecular Phylogenetics and Evolution

Lead the way for us

Similar Papers

Phylogenomic analyses of the pantropical Platycerium Desv. (Platycerioideae) reveal their complex evolution and historical biogeography
Jing Zhao ... Jia-Guan Wang
Molecular Phylogenetics and Evolution | VOL. 201
Jing Zhao, et. al.Jing Zhao ... Jia-Guan Wang
10 Oct 2024
Molecular Phylogenetics and Evolution | VOL. 201

Back together: Over 1000 single-copy nuclear loci and reproductive features support the holoendoparasitic Apodanthaceae and Rafflesiaceae as sister lineages in the order Malpighiales
Juan F Alzate ... Natalia Pabón-Mora
Molecular Phylogenetics and Evolution | VOL. 201
Juan F Alzate, et. al.Juan F Alzate ... Natalia Pabón-Mora
09 Oct 2024
Molecular Phylogenetics and Evolution | VOL. 201

Scaling the high latitudes: evolution, diversification, and dispersal of Coryphella nudibranchs across the Northern Hemisphere
Irina A Ekimova ... Ángel Valdés
Molecular Phylogenetics and Evolution | VOL. 201
Irina A Ekimova, et. al.Irina A Ekimova ... Ángel Valdés
04 Oct 2024
Molecular Phylogenetics and Evolution | VOL. 201

Molecular phylogenetic and estimation of evolutionary divergence and biogeography of the family Schizoparmaceae and allied families (Diaporthales, Ascomycota)
Taichang Mu ... Junzhi Qiu
Molecular Phylogenetics and Evolution | VOL. 201
Taichang Mu, et. al.Taichang Mu ... Junzhi Qiu
03 Oct 2024
Molecular Phylogenetics and Evolution | VOL. 201

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Phylogenetic tree statistics: A systematic overview using the new R package ‘treestats’

Abstract

Talk to us

Similar Papers

More From: Molecular Phylogenetics and Evolution