Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics.

Mathieu Fourment,Aaron E Darling

doi:10.7717/peerj.8272

Abstract

Recent advances in statistical machine learning techniques have led to the creation of probabilistic programming frameworks. These frameworks enable probabilistic models to be rapidly prototyped and fit to data using scalable approximation methods such as variational inference. In this work, we explore the use of the Stan language for probabilistic programming in application to phylogenetic models. We show that many commonly used phylogenetic models including the general time reversible substitution model, rate heterogeneity among sites, and a range of coalescent models can be implemented using a probabilistic programming language. The posterior probability distributions obtained via the black box variational inference engine in Stan were compared to those obtained with reference implementations of Markov chain Monte Carlo (MCMC) for phylogenetic inference. We find that black box variational inference in Stan is less accurate than MCMC methods for phylogenetic models, but requires far less compute time. Finally, we evaluate a custom implementation of mean-field variational inference on the Jukes–Cantor substitution model and show that a specialized implementation of variational inference can be two orders of magnitude faster and more accurate than a general purpose probabilistic implementation.

Highlights

Markov chain Monte Carlo (MCMC) algorithms have become the workhorse of Bayesian phylogenetic inference since they were introduced in the late 1990’s (Mau & Newton, 1997; Larget & Simon, 1999)
We analyzed a set of heterochronous influenza A virus sequences under the strict clock model on a fixed topology with BEAST2 and phylostan
We have developed a tool based on the Stan package for Bayesian phylogenetic inference, which to our knowledge is the first application of variational Bayes (VB) to time trees with coalescent models

Summary

Introduction

Markov chain Monte Carlo (MCMC) algorithms have become the workhorse of Bayesian phylogenetic inference since they were introduced in the late 1990’s (Mau & Newton, 1997; Larget & Simon, 1999). Recent advances in computing hardware and corresponding software implementations have allowed this class of inference method to handle increasingly large datasets (Flouri et al, 2015; Ayres et al, 2019). The quantity of sequence data being generated every year has been growing exponentially, which, when combined with practitioner’s desires to conduct inference on increasingly rich statistical models, makes MCMC algorithms difficult to apply in practice because they are too slow to compute.

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ	Publication Date: Dec 18, 2019
Citations: 23	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ

Lead the way for us

Similar Papers

The Essential Tools of Scientific Machine Learning (Scientific ML)
Christopher Rackauckas
-
Christopher RackauckasChristopher Rackauckas
20 Aug 2019
20 Aug 2019

Monte Carlo co-ordinate ascent variational inference
Lifeng Ye ... Maria De Iorio
Statistics and Computing | VOL. 30
Lifeng Ye, et. al.Lifeng Ye ... Maria De Iorio
14 Feb 2020
Statistics and Computing | VOL. 30

Deep Variational Inference
Iddo Drori
-
Iddo DroriIddo Drori
01 Jan 2020
01 Jan 2020

Bayesian estimation of mixed multinomial logit models: Advances and simulation-based evaluations
Prateek Bansal ... Taha H Rashidi
Transportation Research Part B: Methodological | VOL. 131
Prateek Bansal, et. al.Prateek Bansal ... Taha H Rashidi
12 Dec 2019
Transportation Research Part B: Methodological | VOL. 131

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ