Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees.

Tom M W Nye,Grady Weyenberg,Xiaoxian Tang,Ruriko Yoshida

doi:10.1093/biomet/asx047

Abstract

SummaryEvolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample’s structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the n}{}kth principal component in Euclidean space: the locus of the weighted Fréchet mean of n}{}k+1 vertex trees when the weights vary over the n}{}k-simplex. We establish some basic properties of these objects, in particular showing that they have dimension n}{}k, and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.

Highlights

A great opportunity offered by modern genomics is that phylogenetics applied on a genomic scale, or phylogenomics, should be especially powerful for elucidating gene and genome evolution, relationships among species and populations, and processes of speciation and molecular evolution
In this paper we address two fundamental questions: (i) which geometric object most naturally plays the role of a kth principal component in tree space; and (ii) given such an object, how can we efficiently project data points onto the object? Our proposed solution is to replace the definition of (V ) ⊂ Rm given in (1) with the locus of the weighted Fréchet mean of points v0, . . . , vk in tree space
The locus of the Fréchet mean was first proposed as a geometric object for principal component analysis in tree space in a 2015 University of Kentucky PhD thesis by G

Summary

INTRODUCTION

A great opportunity offered by modern genomics is that phylogenetics applied on a genomic scale, or phylogenomics, should be especially powerful for elucidating gene and genome evolution, relationships among species and populations, and processes of speciation and molecular evolution. In this paper we address two fundamental questions: (i) which geometric object most naturally plays the role of a kth principal component in tree space; and (ii) given such an object, how can we efficiently project data points onto the object? In Euclidean space the locus of the Fréchet mean of some collection of points is an affine subspace; in tree space, the locus can be curved Surfaces of this kind have recently been studied in the context of Riemannian manifolds and other geodesic metric spaces (Pennec, 2015). Using the implicit equations we show that the locus of the Fréchet mean (V ) in TN is locally k-dimensional for generic nondegenerate choices of V , and forms a suitable candidate for a kth principal component. We demonstrate accuracy of the projection algorithm via a simulation study

THE GEOMETRY OF TREE SPACE

THE LOCUS OF THE FRÉCHET MEAN

Findings

DISCUSSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Biometrika	Publication Date: Sep 27, 2017
Citations: 38	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Biometrika

Lead the way for us

Similar Papers

Tropical principal component analysis on the space of phylogenetic trees.
Robert Page ... Leon Zhang
Bioinformatics | VOL. 36
Robert Page, et. al.Robert Page ... Leon Zhang
09 Jun 2020
Bioinformatics | VOL. 36

Maximum likelihood estimation of log-concave densities on tree space
Yuki Takazawa ... Tomonari Sei
Statistics and Computing | VOL. 34
Yuki Takazawa, et. al.Yuki Takazawa ... Tomonari Sei
23 Feb 2024
Statistics and Computing | VOL. 34

Estimating Tropical Principal Components Using Metropolis Hasting Algorithm
Qiwen Kang ... Ruriko Yoshida
-
Qiwen Kang, et. al.Qiwen Kang ... Ruriko Yoshida
01 Jan 2018
01 Jan 2018

Point estimates in phylogenetic reconstructions.
Philipp Benner ... Pierre-Yves Bourguignon
Bioinformatics | VOL. 30
Philipp Benner, et. al.Philipp Benner ... Pierre-Yves Bourguignon
22 Aug 2014
Bioinformatics | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Biometrika