Abstract

Many biological characteristics of evolutionary interest are not scalar variables but continuous functions. Given a dataset of function-valued traits generated by evolution, we develop a practical, statistical approach to infer ancestral function-valued traits, and estimate the generative evolutionary process. We do this by combining dimension reduction and phylogenetic Gaussian process regression, a non-parametric procedure that explicitly accounts for known phylogenetic relationships. We test the performance of methods on simulated, function-valued data generated from a stochastic evolutionary model. The methods are applied assuming that only the phylogeny, and the function-valued traits of taxa at its tips are known. Our method is robust and applicable to a wide range of function-valued data, and also offers a phylogenetically aware method for estimating the autocorrelation of function-valued traits.

Highlights

  • The number, reliability and coverage of evolutionary trees are growing rapidly [1,2]

  • Because the posterior distributions returned by phylogenetic Gaussian process regression (PGPR) depend on the hyperparameter vector g, we must estimate g in order to reconstruct ancestral function-valued traits, and the estimation procedure should correct for the dependence owing to phylogeny

  • Because estimating sif and ‘i alone is challenging [16], and we have further increased the challenge by introducing non-phylogenetic variation, we propose an improved estimation procedure using the machine learning technique bagging [17], which a member of the boosting framework [22]

Read more

Summary

Introduction

The number, reliability and coverage of evolutionary trees are growing rapidly [1,2]. This corresponds to assuming independence between the rows (i.e. that the coefficients of the different basis functions evolve independently). It is commonly argued in the quantitative genetics literature [15] that evolutionary processes can be modelled as Ornstein–Uhlenbeck (OU) processes. The estimated basis functions may be combined statistically, using the posterior distributions of their respective mixing coefficients, to provide a posterior distribution for the function-valued trait. An outline of the framework presented in this study can be found in figure 1

Artificial evolution of function-valued traits
Dimensionality reduction and source separation for function-valued traits
Phylogenetic Gaussian process regression
Hyperparameter estimation
Ancestor reconstruction
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call