Network science inspires novel tree shape statistics.

Leonid Chindelevitch,Caroline Colijn,Art F Y Poon,Maryam Hayati

doi:10.1371/journal.pone.0259877

Leonid Chindelevitch, Caroline Colijn + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0259877

Copy DOI

Abstract

The shape of phylogenetic trees can be used to gain evolutionary insights. A tree’s shape specifies the connectivity of a tree, while its branch lengths reflect either the time or genetic distance between branching events; well-known measures of tree shape include the Colless and Sackin imbalance, which describe the asymmetry of a tree. In other contexts, network science has become an important paradigm for describing structural features of networks and using them to understand complex systems, ranging from protein interactions to social systems. Network science is thus a potential source of many novel ways to characterize tree shape, as trees are also networks. Here, we tailor tools from network science, including diameter, average path length, and betweenness, closeness, and eigenvector centrality, to summarize phylogenetic tree shapes. We thereby propose tree shape summaries that are complementary to both asymmetry and the frequencies of small configurations. These new statistics can be computed in linear time and scale well to describe the shapes of large trees. We apply these statistics, alongside some conventional tree statistics, to phylogenetic trees from three very different viruses (HIV, dengue fever and measles), from the same virus in different epidemiological scenarios (influenza A and HIV) and from simulation models known to produce trees with different shapes. Using mutual information and supervised learning algorithms, we find that the statistics adapted from network science perform as well as or better than conventional statistics. We describe their distributions and prove some basic results about their extreme values in a tree. We conclude that network science-based tree shape summaries are a promising addition to the toolkit of tree shape features. All our shape summaries, as well as functions to select the most discriminating ones for two sets of trees, are freely available as an R package at http://github.com/Leonardini/treeCentrality.

Highlights

Molecular data describing the evolution, variation and diversity of organisms over time is more widely available than ever before due to rapid improvements in sequencing technology
For the node properties derived from network science, we focus our discussion on the maximum value of each type of centrality a node can have within a tree, but using other derived statistics could have been an option
Distinguishing the topologies in these groups of trees requires tools going beyond the traditional symmetry or imbalance metrics; in this case, the only ones that produce statistically significant differences between all three pairs of viruses are the number of cherries, maximum height, maximum width, and the proportion of imbalanced subtrees; all of these capture differences that are not apparent in the imbalance

Summary

Introduction

Molecular data describing the evolution, variation and diversity of organisms over time is more widely available than ever before due to rapid improvements in sequencing technology. This research was undertaken, in part, thanks to funding (CC) from the Canada 150 Research Chairs Program The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Dec 23, 2021
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Network science inspires novel tree shape statistics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Species Selection Regime and Phylogenetic Tree Shape.
G Anthony Verboom ... William A Freyman
Systematic Biology | VOL. 69
G Anthony Verboom, et. al.G Anthony Verboom ... William A Freyman
15 Jan 2020
Systematic Biology | VOL. 69

Phylogenetic tree shape and the structure of mutualistic networks
Scott Chamberlain ... Jana C Vamosi
Journal of Ecology | VOL. 102
Scott Chamberlain, et. al.Scott Chamberlain ... Jana C Vamosi
14 Jul 2014
Journal of Ecology | VOL. 102

Metrics on Multilabeled Trees: Interrelationships and Diameter Bounds
Katharina T Huber ... Vincent Moulton
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 8
Katharina T Huber, et. al.Katharina T Huber ... Vincent Moulton
01 Jul 2011
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 8

Phylogenetic Stability, Tree Shape, and Character Compatibility: A Case Study Using Early Tetrapods
Massimo Bernardi ... Kenneth D Angielczyk
Systematic Biology | VOL. 65
Massimo Bernardi, et. al.Massimo Bernardi ... Kenneth D Angielczyk
10 Jun 2016
Systematic Biology | VOL. 65

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Network science inspires novel tree shape statistics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE