Cross-validation to select Bayesian hierarchical models in phylogenetics

Sebastián Duchêne,David A Duchêne,Kathryn E Holt,Edward C Holmes,Francesca Di Giallonardo,Simon Y W Ho,Jemma L Geoghegan,John-Sebastian Eden

doi:10.1186/s12862-016-0688-y

Sebastián Duchêne, David A Duchêne + Show 6 more

Open Access

https://doi.org/10.1186/s12862-016-0688-y

Copy DOI

Abstract

BackgroundRecent developments in Bayesian phylogenetic models have increased the range of inferences that can be drawn from molecular sequence data. Accordingly, model selection has become an important component of phylogenetic analysis. Methods of model selection generally consider the likelihood of the data under the model in question. In the context of Bayesian phylogenetics, the most common approach involves estimating the marginal likelihood, which is typically done by integrating the likelihood across model parameters, weighted by the prior. Although this method is accurate, it is sensitive to the presence of improper priors. We explored an alternative approach based on cross-validation that is widely used in evolutionary analysis. This involves comparing models according to their predictive performance.ResultsWe analysed simulated data and a range of viral and bacterial data sets using a cross-validation approach to compare a variety of molecular clock and demographic models. Our results show that cross-validation can be effective in distinguishing between strict- and relaxed-clock models and in identifying demographic models that allow growth in population size over time. In most of our empirical data analyses, the model selected using cross-validation was able to match that selected using marginal-likelihood estimation. The accuracy of cross-validation appears to improve with longer sequence data, particularly when distinguishing between relaxed-clock models.ConclusionsCross-validation is a useful method for Bayesian phylogenetic model selection. This method can be readily implemented even when considering complex models where selecting an appropriate prior for all parameters may be difficult.Electronic supplementary materialThe online version of this article (doi:10.1186/s12862-016-0688-y) contains supplementary material, which is available to authorized users.

Highlights

Recent developments in Bayesian phylogenetic models have increased the range of inferences that can be drawn from molecular sequence data
We extend the cross-validation method proposed by Lartillot et al [19] for substitution models to other components of the Bayesian hierarchical model: the molecular clock model and the demographic model
The uncorrelated lognormal (UCLN) model had the strongest support for most of the data sets, even for those generated under the uncorrelated exponential (UCED) model

Summary

Introduction

Recent developments in Bayesian phylogenetic models have increased the range of inferences that can be drawn from molecular sequence data. In the context of Bayesian phylogenetics, the most common approach involves estimating the marginal likelihood, which is typically done by integrating the likelihood across model parameters, weighted by the prior. This method is accurate, it is sensitive to the presence of improper priors. Evolutionary analyses of gene sequence data are increasingly reliant on model-based phylogenetic approaches In recent years, this has been given substantial impetus by the surge in genome-scale data, improvements in computational power, and the application of Bayesian statistical methods to phylogenetics [1]. Model misspecification can result in errors in the estimates of other parameters, including

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Evolutionary Biology	Publication Date: May 26, 2016
Citations: 51	License type: cc-by

R Discovery Prime

R Discovery Prime

Cross-validation to select Bayesian hierarchical models in phylogenetics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Evolutionary Biology

Lead the way for us

Similar Papers

Classification of molecular sequence data using Bayesian phylogenetic mixture models
E Loza-Reyes ... A Robinson
Computational Statistics & Data Analysis | VOL. 75
E Loza-Reyes, et. al.E Loza-Reyes ... A Robinson
29 Jan 2014
Computational Statistics & Data Analysis | VOL. 75

Accurate Model Selection of Relaxed Molecular Clocks in Bayesian Phylogenetics
Guy Baele ... Wai Lok Sibon Li
Molecular Biology and Evolution | VOL. 30
Guy Baele, et. al.Guy Baele ... Wai Lok Sibon Li
01 Feb 2012
Molecular Biology and Evolution | VOL. 30

Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty.
Guy Baele ... Marc A Suchard
Systematic Biology | VOL. 65
Guy Baele, et. al.Guy Baele ... Marc A Suchard
01 Nov 2015
Systematic Biology | VOL. 65

Model Averaging and Bayes Factor Calculation of Relaxed Molecular Clocks in Bayesian Phylogenetics
W L S Li ... A J Drummond
Molecular Biology and Evolution | VOL. 29
W L S Li, et. al.W L S Li ... A J Drummond
22 Sep 2011
Molecular Biology and Evolution | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross-validation to select Bayesian hierarchical models in phylogenetics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Evolutionary Biology