Abstract

SummaryEvolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample’s structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the n}{}kth principal component in Euclidean space: the locus of the weighted Fréchet mean of n}{}k+1 vertex trees when the weights vary over the n}{}k-simplex. We establish some basic properties of these objects, in particular showing that they have dimension n}{}k, and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.

Highlights

  • A great opportunity offered by modern genomics is that phylogenetics applied on a genomic scale, or phylogenomics, should be especially powerful for elucidating gene and genome evolution, relationships among species and populations, and processes of speciation and molecular evolution

  • In this paper we address two fundamental questions: (i) which geometric object most naturally plays the role of a kth principal component in tree space; and (ii) given such an object, how can we efficiently project data points onto the object? Our proposed solution is to replace the definition of (V ) ⊂ Rm given in (1) with the locus of the weighted Fréchet mean of points v0, . . . , vk in tree space

  • The locus of the Fréchet mean was first proposed as a geometric object for principal component analysis in tree space in a 2015 University of Kentucky PhD thesis by G

Read more

Summary

INTRODUCTION

A great opportunity offered by modern genomics is that phylogenetics applied on a genomic scale, or phylogenomics, should be especially powerful for elucidating gene and genome evolution, relationships among species and populations, and processes of speciation and molecular evolution. In this paper we address two fundamental questions: (i) which geometric object most naturally plays the role of a kth principal component in tree space; and (ii) given such an object, how can we efficiently project data points onto the object? In Euclidean space the locus of the Fréchet mean of some collection of points is an affine subspace; in tree space, the locus can be curved Surfaces of this kind have recently been studied in the context of Riemannian manifolds and other geodesic metric spaces (Pennec, 2015). Using the implicit equations we show that the locus of the Fréchet mean (V ) in TN is locally k-dimensional for generic nondegenerate choices of V , and forms a suitable candidate for a kth principal component. We demonstrate accuracy of the projection algorithm via a simulation study

THE GEOMETRY OF TREE SPACE
THE LOCUS OF THE FRÉCHET MEAN
Findings
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call