Universal probabilistic programming offers a powerful approach to statistical phylogenetics

Fredrik Ronquist,David Broman,Viktor Senderov,Nicolas Lartillot,Lawrence Murray,Jan Kudlicka,Daniel Lundén,Thomas B Schön,Johannes Borgström

doi:10.1038/s42003-021-01753-7

Abstract

Statistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems. Here, we show that universal probabilistic programming languages (PPLs) solve the expressivity problem, while still supporting automated generation of efficient inference algorithms. To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. SMC is a new inference strategy for these problems, supporting both parameter inference and efficient estimation of Bayes factors that are used in model testing. We take advantage of this in automatically generating SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously. Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best. Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.

Highlights

Statistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies
We show that models with slowing diversification, constant turnover and many small shifts generally explain the data from 40 bird phylogenies better than alternative models
Consider one of the simplest of all diversification models, constant rate birth–death (CRBD), in which lineages arise at a rate λ and die out at a rate μ, giving rise to a phylogenetic tree τ

Summary

Introduction

Statistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. We show that universal probabilistic programming languages (PPLs) solve the expressivity problem, while still supporting automated generation of efficient inference algorithms To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. Not so in phylogenetics, where empiricists are largely dependent on dedicated software developed by small teams of computational biologists[3] Even though these software packages have become increasingly flexible in recent years, empiricists are still limited to a large extent by predefined model spaces and inference strategies. PGMs can express many components of phylogenetic models in a structured way, so that efficient Markov chain Monte Carlo (MCMC) samplers—the current workhorse of Bayesian statistical phylogenetics—can be automatically generated for them[5]. More novel inference strategies are readily applied to PGM descriptions of phylogenetic model components, as exemplified by recent work using STAN6 or the new Blang framework[7]

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Communications Biology	Publication Date: Feb 24, 2021
Citations: 18	License type: open-access

R Discovery Prime

R Discovery Prime

Universal probabilistic programming offers a powerful approach to statistical phylogenetics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Communications Biology

Lead the way for us

Similar Papers

Sequential Monte Carlo methods for epidemic data

-

18 Jul 2020
18 Jul 2020

Sequential Monte-Carlo algorithms for Bayesian model calibration – A review and method comparison✰
Matthias Speich ... Florian Hartig
Ecological Modelling | VOL. 455
Matthias Speich, et. al.Matthias Speich ... Florian Hartig
05 Jun 2021
Ecological Modelling | VOL. 455

Multilevel sequential monte carlo algorithms for MIMO demodulation
Pradeep Aggarwal ... Xiaodong Wang
IEEE Transactions on Wireless Communications | VOL. 6
Pradeep Aggarwal, et. al.Pradeep Aggarwal ... Xiaodong Wang
01 Feb 2007
IEEE Transactions on Wireless Communications | VOL. 6

Correctness of Sequential Monte Carlo Inference for Probabilistic Programming Languages
Daniel Lundén ... Johannes Borgström
-
Daniel Lundén, et. al.Daniel Lundén ... Johannes Borgström
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Universal probabilistic programming offers a powerful approach to statistical phylogenetics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Communications Biology