Improved estimation of macroevolutionary rates from fossil data using a Bayesian framework

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Abstract The estimation of origination and extinction rates and their temporal variation is central to understanding diversity patterns and the evolutionary history of clades. The fossil record provides the only direct evidence of extinction and biodiversity changes through time and has long been used to infer the dynamics of diversity changes in deep time. The software PyRate implements a Bayesian framework to analyze fossil occurrence data to estimate the rates of preservation, origination, and extinction while incorporating several sources of uncertainty. Building upon this framework, we present a suite of methodological advances including more complex and realistic models of preservation and the first likelihood-based test to compare the fit across different models. Further, we develop a new reversible jump Markov chain Monte Carlo algorithm to estimate origination and extinction rates and their temporal variation, which provides more reliable results and includes an explicit estimation of the number and temporal placement of statistically significant rate changes. Finally, we implement a new C++ library that speeds up the analyses by orders of magnitude, therefore facilitating the application of the PyRate methods to large data sets. We demonstrate the new functionalities through extensive simulations and with the analysis of a large data set of Cenozoic marine mammals. We compare our analytical framework against two widely used alternative methods to infer origination and extinction rates, revealing that PyRate decisively outperforms them across a range of simulated data sets. Our analyses indicate that explicit statistical model testing, which is often neglected in fossil-based macroevolutionary analyses, is crucial to obtain accurate and robust results.

Similar Papers
  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-319-29754-5_3
Bayesian Inference and RJMCMC in Structural Dynamics: On Experimental Data
  • Jun 28, 2016
  • D Tiboaca + 4 more

This paper is concerned with applying the Reversible Jump Markov Chain Monte Carlo (RJMCMC) algorithm on an MDOF system, within a Bayesian framework, in order to identify its parameters and do model selection simultaneously. Bayesian Inference has been widely used in the area of System Identification (SID) on issues of parameter estimation as well as model selection, due to its advantages of using prior knowledge and penalising model complexity, the Bayesian probability framework has been employed on issues of parameter estimation as well as model selection. Even though the posterior probabilities of parameters are often complex, the use of Markov Chain Monte Carlo (MCMC) sampling methods has made the application of the approach significantly more straightforward in structural dynamics. The most commonly applied MCMC sampling algorithm used within a Bayesian framework in the area of structural dynamics is probably the Metropolis-Hastings method. However, the MH algorithm cannot cover model selection in cases where competing model structures have different number of parameters, as it is not capable of moving between spaces of differing dimension. Hence, a new MCMC algorithm, the Reversible Jump Markov Chain Monte Carlo (RJMCMC), has surfaced in 1995. The RJMCMC sampling algorithm is capable of simultaneously covering both parameter estimation and model selection, while jumping between spaces in which the dimension of the parameter vector varies. Using a Bayesian approach, the RJMCMC method ensures that overfitting is prevented. This work focuses on using the RJMCMC algorithm for System Identification (SID) on experimental time data gathered from an MDOF ‘bookshelf’ type structure. The major issue addressed is that of noise variance. Two models are used, the first one using a single noise variance for the entire structure and the second one employing three different noise variances for the three different levels of the structure. The results presented in the last section show the capabilities of the RJMCMC algorithm as a powerful tool in the SID of dynamical structures.

  • Research Article
  • Cite Count Icon 38
  • 10.1038/s41598-022-26010-7
Combining palaeontological and neontological data shows a delayed diversification burst of carcharhiniform sharks likely mediated by environmental change
  • Dec 19, 2022
  • Scientific Reports
  • Baptiste Brée + 2 more

Estimating deep-time species-level diversification processes remains challenging. Both the fossil record and molecular phylogenies allow the estimation of speciation and extinction rates, but each type of data may still provide an incomplete picture of diversification dynamics. Here, we combine species-level palaeontological (fossil occurrences) and neontological (molecular phylogenies) data to estimate deep-time diversity dynamics through process-based birth–death models for Carcharhiniformes, the most speciose shark order today. Despite their abundant fossil record dating back to the Middle Jurassic, only a small fraction of extant carcharhiniform species is recorded as fossils, which impedes relying only on the fossil record to study their recent diversification. Combining fossil and phylogenetic data, we recover a complex evolutionary history for carcharhiniforms, exemplified by several variations in diversification rates with an early low diversity period followed by a Cenozoic radiation. We further reveal a burst of diversification in the last 30 million years, which is partially recorded with fossil data only. We also find that reef expansion and temperature change can explain variations in speciation and extinction through time. These results pinpoint the primordial importance of these environmental variables in the evolution of marine clades. Our study also highlights the benefit of combining the fossil record with phylogenetic data to address macroevolutionary questions.

  • Research Article
  • Cite Count Icon 17
  • 10.1038/sj.hdy.6800644
Molecular clocks: Closing the gap between rocks and clocks
  • Feb 16, 2005
  • Heredity
  • K Cranston + 1 more

Molecular clocks: Closing the gap between rocks and clocks

  • Research Article
  • Cite Count Icon 9
  • 10.1111/1365-2478.13339
Facies‐constrained transdimensional amplitude versus angle inversion using machine learning assisted priors
  • Mar 20, 2023
  • Geophysical Prospecting
  • Arnab Dhara + 2 more

ABSTRACTWe present a methodology for seismic inversion that generates high‐resolution models of facies and elastic properties from pre‐stack data. Our inversion algorithm uses a transdimensional approach where, in addition to the layer properties, the number of layers is treated as unknown. In other words, the data itself determine the correct model parameterization, that is, the number of layers. The reversible jump Markov Chain Monte Carlo method is an effective tool to solve such transdimensional problems as it generates models of reservoir properties along with uncertainty estimates. However, current implementations of the reversible jump Markov Chain Monte Carlo algorithms do not account for the non‐Gaussian and multimodal nature of model parameters. The target elastic reservoir properties generally have multimodal and non‐parametric distribution at each location of the model. The number of modes is equal to the number of facies. Taking these factors into account, we extend the reversible jump Markov Chain Monte Carlo algorithm to simultaneously invert for discrete facies and continuous elastic reservoir properties. The proposed extension to the algorithm iteratively samples the facies, by moving from one mode to another, and elastic properties by sampling within the same mode. The integration of facies classification within the inversion reduces non‐uniqueness, improves convergence speed and produces geologically consistent results. The workflow uses machine learning to generate probabilistic priors for the model parameters. We validate our approach by applying it to a synthetic dataset generated from a well log with two facies and then to a complex synthetic two‐dimensional model involving three facies having overlapping elastic property distribution. Finally, we apply our algorithm to a field dataset acquired over an unconventional reservoir. Our algorithm demonstrates the usefulness of incorporating facies information in seismic inversion and also the feasibility of inverting for facies from seismic data.

  • Research Article
  • Cite Count Icon 34
  • 10.1086/508264
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants
  • Oct 1, 2006
  • The American Journal of Human Genetics
  • Andrew P Morris

A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants

  • Research Article
  • Cite Count Icon 74
  • 10.1103/physrevd.80.063007
Bayesian approach to the detection problem in gravitational wave astronomy
  • Sep 24, 2009
  • Physical Review D
  • Tyson B Littenberg + 1 more

The analysis of data from gravitational wave detectors can be divided into three phases: search, characterization, and evaluation. The evaluation of the detection - determining whether a candidate event is astrophysical in origin or some artifact created by instrument noise - is a crucial step in the analysis. The on-going analyses of data from ground based detectors employ a frequentist approach to the detection problem. A detection statistic is chosen, for which background levels and detection efficiencies are estimated from Monte Carlo studies. This approach frames the detection problem in terms of an infinite collection of trials, with the actual measurement corresponding to some realization of this hypothetical set. Here we explore an alternative, Bayesian approach to the detection problem, that considers prior information and the actual data in hand. Our particular focus is on the computational techniques used to implement the Bayesian analysis. We find that the Parallel Tempered Markov Chain Monte Carlo (PTMCMC) algorithm is able to address all three phases of the anaylsis in a coherent framework. The signals are found by locating the posterior modes, the model parameters are characterized by mapping out the joint posterior distribution, and finally, the model evidence is computed by thermodynamic integration. As a demonstration, we consider the detection problem of selecting between models describing the data as instrument noise, or instrument noise plus the signal from a single compact galactic binary. The evidence ratios, or Bayes factors, computed by the PTMCMC algorithm are found to be in close agreement with those computed using a Reversible Jump Markov Chain Monte Carlo algorithm.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/icip.2002.1038922
A reversible jump Markov chain Monte Carlo algorithm for analysis of functional neuroimages
  • Jun 24, 2002
  • Proceedings - International Conference on Image Processing
  • A.S Lukic + 4 more

We propose a new signal-detection approach for detecting brain activations from PET or fMRI images in a two-state ("on-off") neuroimaging study. We model the activation pattern as a superposition of an unknown number of circular spatial basis functions of unknown position, size, and amplitude. We determine the number of these functions and their parameters by maximum a posteriori (MAP) estimation. To maximize the posterior distribution we use a reversible jump Markov-chain Monte-Carlo (RJMCMC) algorithm. The main advantage of RJMCMC is that it can estimate parameter vectors of unknown length. Thus, in the model used the number of activation sites does not need to be known. Using a phantom derived from a neuroimaging study, we demonstrate that the proposed method can estimate more accurately the activation pattern from traditional approaches.

  • Research Article
  • Cite Count Icon 10
  • 10.1017/s0094837300019217
Memoir 4: An Analysis of the History of Marine Animal Diversity
  • Jan 1, 2007
  • Paleobiology
  • Steven M Stanley

According to when they attained high diversity, major taxa of marine animals have been clustered into three groups, the Cambrian, Paleozoic, and Modern Faunas. Because the Cambrian Fauna was a relatively minor component of the total fauna after mid-Ordovician time, the Phanerozoic history of marine animal diversity is largely a matter of the fates of the Paleozoic and Modern Faunas. The fact that most late Cenozoic genera belong to taxa that have been radiating for tens of millions of years indicates that the post-Paleozoic increase in diversity indicated by fossil data is real, rather than an artifact of improvement of the fossil record toward the present.Assuming that ecological crowding produced the so-called Paleozoic plateau for family diversity, various workers have used the logistic equation of ecology to model marine animal diversification as damped exponential increase. Several lines of evidence indicate that this procedure is inappropriate. A plot of the diversity of marine animal genera through time provides better resolution than the plot for families and has a more jagged appearance. Generic diversity generally increased rapidly during the Paleozoic, except when set back by pulses of mass extinction. In fact, an analysis of the history of the Paleozoic Fauna during the Paleozoic Era reveals no general correlation between rate of increase for this fauna and total marine animal diversity. Furthermore, realistically scaled logistic simulations do not mimic the empirical pattern. In addition, it is difficult to imagine how some fixed limit for diversity could have persisted throughout the Paleozoic Era, when the ecological structure of the marine ecosystem was constantly changing. More fundamentally, the basic idea that competition can set a limit for marine animal diversity is incompatible with basic tenets of marine ecology: predation, disturbance, and vagaries of recruitment determine local population sizes for most marine species. Sparseness of predators probably played a larger role than weak competition in elevating rates of diversification during the initial (Ordovician) radiation of marine animals and during recoveries from mass extinctions. A plot of diversification against total diversity for these intervals yields a band of points above the one representing background intervals, and yet this band also displays no significant trend (if the two earliest intervals of the initial Ordovician are excluded as times of exceptional evolutionary innovation). Thus, a distinctive structure characterized the marine ecosystem during intervals of evolutionary radiation—one in which rates of diversification were exceptionally high and yet increases in diversity did not depress rates of diversification.Particular marine taxa exhibit background rates of origination and extinction that rank similarly when compared with those of other taxa. Rates are correlated in this way because certain heritable traits influence probability of speciation and probability of extinction in similar ways. Background rates of origination and extinction were depressed during the late Paleozoic ice age for all major marine invertebrate taxa, but remained correlated. Also, taxa with relatively high background rates of extinction experienced exceptionally heavy losses during biotic crises because background rates of extinction were intensified in a multiplicative manner; decimation of a large group of taxa of this kind in the two Permian mass extinctions established their collective identity as the Paleozoic Fauna.Characteristic rates of origination and extinction for major taxa persisted from Paleozoic into post-Paleozoic time. Because of the causal linkage between rates of origination and extinction, pulses of extinction tended to drag down overall rates of origination as well as overall rates of extinction by preferentially eliminating higher taxa having relatively high background rates of extinction. This extinction/origination ratchet depressed turnover rates for the residual Paleozoic Fauna during the Mesozoic Era. A decline of this fauna's extinction rate to approximately that of the Modern Fauna accounts for the nearly equal fractional losses experienced by the two faunas in the terminal Cretaceous mass extinction.Viewed arithmetically, the fossil record indicates slow diversification for the Modern Fauna during Paleozoic time, followed by much more rapid expansion during Mesozoic and Cenozoic time. When viewed more appropriately as depicting geometric—or exponential—increase, however, the empirical pattern exhibits no fundamental secular change: the background rate of increase for the Modern Fauna—the fauna that dominated post-Paleozoic marine diversity—simply persisted, reflecting the intrinsic origination and extinction rates of constituent taxa. Persistence of this overall background rate supports other evidence that the empirical record of diversification for marine animal life since Paleozoic time represents actual exponential increase. This enduring rate makes it unnecessary to invoke environmental change to explain the post-Paleozoic increase of marine diversity.Because of the resilience of intrinsic rates, an empirically based simulation that entails intervals of exponential increase for the Paleozoic and Modern Faunas, punctuated by mass extinctions, yields a pattern that is remarkably similar to the empirical pattern. It follows that marine animal genera and species will continue to diversify exponentially long into the future, barring disruption of the marine ecosystem by human-induced or natural environmental changes.

  • Research Article
  • Cite Count Icon 104
  • 10.1023/b:stco.0000039484.36470.41
Learning a multivariate Gaussian mixture model with the reversible jump MCMC algorithm
  • Oct 1, 2004
  • Statistics and Computing
  • Zhihua Zhang + 3 more

This paper is a contribution to the methodology of fully Bayesian inference in a multivariate Gaussian mixture model using the reversible jump Markov chain Monte Carlo algorithm. To follow the constraints of preserving the first two moments before and after the split or combine moves, we concentrate on a simplified multivariate Gaussian mixture model, in which the covariance matrices of all components share a common eigenvector matrix. We then propose an approach to the construction of the reversible jump Markov chain Monte Carlo algorithm for this model. Experimental results on several data sets demonstrate the efficacy of our algorithm.

  • Conference Article
  • Cite Count Icon 4
  • 10.1063/1.5139817
MRI-based brain tumor segmentation using Gaussian mixture model with reversible jump Markov chain Monte Carlo algorithm
  • Jan 1, 2019
  • AIP conference proceedings
  • Anindya Apriliyanti Pravitasari + 6 more

A brain tumor is the 15th deadly disease in Indonesia according to the WHO in 2018. In medical treatment, brain tumors can be detected through Magnetic Resonance Imaging (MRI). The main problem is how to separate the brain tumor area as the Region of interest (ROI) with the other healthy part (Non-ROI) in the MRI. In the computational statistics, a method used in image segmentation is cluster analysis. Model-Based Clustering with Gaussian Mixture Model (GMM) is often used to find the cluster where the tumor is placed. The EM Algorithm and Bayesian coupled with Markov chain Monte Carlo (MCMC) could be used to optimize the GMM. However, both EM and Bayesian MCMC are assumed that the number of clusters is fixed. Therefore, to select the optimum number of clusters, we have to use certain cluster selection criteria. This process makes the segmentation quite complicated and is not automatic. This study tries to employ the GMM using Reversible Jump Markov Chain Monte Carlo Algorithm (GMM-RJMCMC) to segment the MRI-based brain tumor and compare it with the GMM-MCMC. The use of RJMCMC is expected to accelerate the calculation process, which can provide the number of optimum clusters automatically; moreover, the MRI image segmentation could become more adaptive. The result shows that from the Correct Classification Ratio (CCR), the GMM-RJMCMC could provide an equal segmentation results compared to the GMM-MCMC, however, GMM-RJMCMC has the advantage, that is faster in executing the algorithm, this makes GMM-RJMCMC more efficient in finding the optimum number of clusters.

  • Research Article
  • Cite Count Icon 1
  • 10.1080/10629360500109226
Comparing the performance of a reversible jump Markov chain Monte Carlo algorithm for DNA sequences alignment
  • Jul 1, 2006
  • Journal of Statistical Computation and Simulation
  • Luis J Álvarez + 2 more

Assume that K independent copies are made from a common prototype DNA sequence whose length is a random variable. In this paper, the problem of aligning those copies and therefore the problem of estimating the prototype sequence that produced the copies is addressed. A hidden Markov chain is used to model the copying procedure, and a reversible jump Markov chain Monte Carlo algorithm is used to sample the parameters of the model from their posterior distribution. Using the sample obtained, the Bayesian model and the prototype sequence may be selected using the maximum a posteriori estimate. A prior distribution for the prototype DNA sequence that incorporates a correlation among neighbouring bases is also considered. In addition, an analysis of the performance of the algorithm is presented when different scenarios are taken into account.

  • Research Article
  • Cite Count Icon 47
  • 10.1093/sysbio/syx082
Estimating Age-Dependent Extinction: Contrasting Evidence from Fossils and Phylogenies.
  • Nov 10, 2017
  • Systematic Biology
  • Oskar Hagen + 4 more

The estimation of diversification rates is one of the most vividly debated topics in modern systematics, with considerable controversy surrounding the power of phylogenetic and fossil-based approaches in estimating extinction. Van Valen’s seminal work from 1973 proposed the “Law of constant extinction,” which states that the probability of extinction of taxa is not dependent on their age. This assumption of age-independent extinction has prevailed for decades with its assessment based on survivorship curves, which, however, do not directly account for the incompleteness of the fossil record, and have rarely been applied at the species level. Here, we present a Bayesian framework to estimate extinction rates from the fossil record accounting for age-dependent extinction (ADE). Our approach, unlike previous implementations, explicitly models unobserved species and accounts for the effects of fossil preservation on the observed longevity of sampled lineages. We assess the performance and robustness of our method through extensive simulations and apply it to a fossil data set of terrestrial Carnivora spanning the past 40 myr. We find strong evidence of ADE, as we detect the extinction rate to be highest in young species and declining with increasing species age. For comparison, we apply a recently developed analogous ADE model to a dated phylogeny of extant Carnivora. Although the phylogeny-based analysis also infers ADE, it indicates that the extinction rate, instead, increases with increasing taxon age. The estimated mean species longevity also differs substantially, with the fossil-based analyses estimating 2.0 myr, in contrast to 9.8 myr derived from the phylogeny-based inference. Scrutinizing these discrepancies, we find that both fossil and phylogeny-based ADE models are prone to high error rates when speciation and extinction rates increase or decrease through time. However, analyses of simulated and empirical data show that fossil-based inferences are more robust. This study shows that an accurate estimation of ADE from incomplete fossil data is possible when the effects of preservation are jointly modeled, thus allowing for a reassessment of Van Valen’s model as a general rule in macroevolution.

  • Research Article
  • Cite Count Icon 3
  • 10.2991/ijcis.d.200310.006
Hierarchical Bayesian Choice of Laplacian ARMA Models Based on Reversible Jump MCMC Computation
  • Jan 1, 2020
  • International Journal of Computational Intelligence Systems
  • Suparman

An autoregressive moving average (ARMA) is a time series model that is applied in everyday life for pattern recognition and forecasting. The ARMA model contains a noise which is assumed to have a specific distribution. The noise is often considered to have a Gaussian distribution. However in applications, the noise is sometimes found that does not have a Gaussian distribution. The first objective is to develop the ARMA model in which noise has a Laplacian distribution. The second objective is to estimate the parameters of the ARMA model. The ARMA model parameters include ARMA model orders, ARMA model coefficients, and noise variance. The parameter estimation of the ARMA model is carried out in the Bayesian framework. In the Bayesian framework, the ARMA model parameters are treated as a variable that has a prior distribution. The prior distribution for the ARMA model parameters is combined with the likelihood function for the data to get the posterior distribution for the parameter. The posterior distribution for parameters has a complex form so that the Bayes estimator cannot be determined analytically. The reversible jump Markov chain Monte Carlo (MCMC) algorithm was adopted to determine the Bayes estimator. The first result, the ARMA model can be developed by assuming Laplacian distribution noise. The second result, the performance of the algorithm was tested using simulation studies. The simulation shows that the reversible jump MCMC algorithm can estimate the parameters of the ARMA model correctly.

  • Discussion
  • Cite Count Icon 2
  • 10.1186/s13059-016-0942-z
Response to Comment by Faurby, Werdelin and Svenning
  • May 5, 2016
  • Genome Biology
  • Stephen J O’Brien + 8 more

Response to Comment by Faurby, Werdelin and Svenning

  • Research Article
  • Cite Count Icon 9
  • 10.1190/geo2021-0534.1
A Julia software package for transdimensional Bayesian inversion of electromagnetic data over horizontally stratified media
  • Jul 6, 2022
  • Geophysics
  • Ronghua Peng + 3 more

A quantitative assessment of model parameter uncertainty is vital for a reliable interpretation of electromagnetic (EM) data due to the nonuniqueness inherent to EM inverse problems. Conventional gradient-based inversion approaches typically produce a single preferred model with limited information about parameter uncertainty. The inverse problems can be alternatively postulated into a sampling-based Bayesian inference framework where the solution is represented by a posterior probability distribution of model parameters, which can provide an effective way to rigorously estimate parameter uncertainty related to the recovered solution. We have implemented a Bayesian inversion framework for probabilistic inversion of EM data, which is an open-source software package implemented in the Julia programming language. The key feature of the framework is that it allows the model complexity to be adaptively adjusted to an appropriate level compatible with the data by implementing a reversible jump Markov chain Monte Carlo algorithm, thus allowing the data to infer the appropriate level of model complexity and associated parameter uncertainty. The authors have elaborated the structure of the package with a focus on code modularity and extensibility. Finally, the authors determine the capacity and versatility of the software package through three synthetic examples that simulate different EM scenarios. The inversion results demonstrate the performance and inherent resolving abilities of different EM surveys for conductive and/or resistive structures. In addition, the model ensemble produced by the transdimensional Bayesian inversion conveys a wealth of information. Many important quantities for data interpretation such as parameter uncertainty and correlation between model parameters of interest can be measured by exploring the model ensemble.

Save Icon
Up Arrow
Open/Close