Abstract

Phylogenetic analyses often produce large numbers of trees. Mapping trees’ distribution in “tree space” can illuminate the behavior and performance of search strategies, reveal distinct clusters of optimal trees, and expose differences between different data sources or phylogenetic methods—but the high-dimensional spaces defined by metric distances are necessarily distorted when represented in fewer dimensions. Here, I explore the consequences of this transformation in phylogenetic search results from 128 morphological data sets, using stratigraphic congruence—a complementary aspect of tree similarity—to evaluate the utility of low-dimensional mappings. I find that phylogenetic similarities between cladograms are most accurately depicted in tree spaces derived from information-theoretic tree distances or the quartet distance. Robinson–Foulds tree spaces exhibit prominent distortions and often fail to group trees according to phylogenetic similarity, whereas the strong influence of tree shape on the Kendall–Colijn distance makes its tree space unsuitable for many purposes. Distances mapped into two or even three dimensions often display little correspondence with true distances, which can lead to profound misrepresentation of clustering structure. Without explicit testing, one cannot be confident that a tree space mapping faithfully represents the true distribution of trees, nor that visually evident structure is valid. My recommendations for tree space validation and visualization are implemented in a new graphical user interface in the “TreeDist” R package. [Multidimensional scaling; phylogenetic software; tree distance metrics; treespace projections.]

Highlights

  • This study considers distances that purport to quantify the similarity of relationships between cladograms: the Robinson–Foulds (RF), matching split information (MS), phylogenetic information (PI), clustering information (CI), path (Pt), Kendall–Colijn (KC) and quartet (Q)

  • Tree spaces are defined with reference to an underpinning distance metric

  • A distance metric should afford smaller distances to trees that are more similar with respect to the properties under consideration – different metrics can impose profoundly different tree spaces (Fig. 1), so a tree space will only be illuminating if its underlying metric is relevant to its application

Read more

Summary

Introduction

Distribution in ‘tree space’ can illuminate the behaviour and performance of search strategies, reveal distinct clusters of optimal trees, and expose differences between different data sources or phylogenetic methods – but the high-dimensional spaces defined by metric distances are necessarily distorted when represented in fewer dimensions. I explore the consequences of this transformation in phylogenetic search results from 128 morphological datasets, using stratigraphic congruence – a complementary aspect of tree similarity – to evaluate the utility of low-dimensional mappings

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call