Abstract

BackgroundSamples of molecular sequence data of a locus obtained from random individuals in a population are often related by an unknown genealogy. More importantly, population genetics parameters, for instance, the scaled population mutation rate Θ=4Neμ for diploids or Θ=2Neμ for haploids (where Ne is the effective population size and μ is the mutation rate per site per generation), which explains some of the evolutionary history and past qualities of the population that the samples are obtained from, is of significant interest.ResultsIn this paper, we present the evolution of sequence data in a Bayesian framework and the approximation of the posterior distributions of the unknown parameters of the model, which include Θ via the sequential Monte Carlo (SMC) samplers for static models. Specifically, we approximate the posterior distributions of the unknown parameters with a set of weighted samples i.e., the set of highly probable genealogies out of the infinite set of possible genealogies that describe the sampled sequences. The proposed SMC algorithm is evaluated on simulated DNA sequence datasets under different mutational models and real biological sequences. In terms of the accuracy of the estimates, the proposed SMC method shows a comparable and sometimes, better performance than the state-of-the-art MCMC algorithms.ConclusionsWe showed that the SMC algorithm for static model is a promising alternative to the state-of-the-art approach for simulating from the posterior distributions of population genetics parameters.

Highlights

  • Samples of molecular sequence data of a locus obtained from random individuals in a population are often related by an unknown genealogy

  • Results we demonstrate the performance of the proposed sequential Monte Carlo (SMC) algorithm using both simulated datasets and real biological sequences

  • We compare the estimates obtained from the proposed SMC algorithm to that of the MH-Markov Chain Monte Carlo (MCMC) algorithm

Read more

Summary

Introduction

Samples of molecular sequence data of a locus obtained from random individuals in a population are often related by an unknown genealogy. Samples of molecular data, such as DNA sequence, taken from a population are often related by an unknown genealogy [1], a family tree which depicts the ancestors and descendants of individuals in the sample and whose shape is altered by the population processes, such as migration, genetic drift, change of population size, etc. Based on the estimation of these important parameters, [8, 9] were able to infer past environmental conditions (in combination with documented geologic events) that explain the current patterns in the population; they investigated the role of environmental factors in shaping the contemporary phylogeographic pattern and studied the genetic homogeneity of organisms. In species classification, knowledge of these parameters has helped in classifying previously unclassified or wrongly classified organisms [10] and in Ogundijo and Wang BMC Bioinformatics (2017) 18:541 investigating the contribution of geographic barriers in the diversification and classification of organisms [11, 12]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call