Abstract

A key parameter in population genetics is the scaled mutation rate θ = 4 N μ , where N is the effective haploid population size and μ is the mutation rate per haplotype per generation. While exact likelihood inference is notoriously difficult in population genetics, we propose a novel approach to compute a first order accurate likelihood of θ that is based on dynamic programming under the infinite sites model without recombination. The parameter θ may be either constant, i.e., time-independent, or time-dependent, which allows for changes of demography and deviations from neutral equilibrium. For time-independent θ, the performance is compared to the approach in Griffiths and Tavaré’s work “Simulating Probability Distributions in the Coalescent” (Theor. Popul. Biol. 1994, 46, 131–159) that is based on importance sampling and implemented in the “genetree” program. Roughly, the proposed method is computationally fast when n × θ < 100 , where n is the sample size. For time-dependent θ ( t ) , we analyze a simple demographic model with a single change in θ ( t ) . In this case, the ancestral and current θ need to be estimated, as well as the time of change. To our knowledge, this is the first accurate computation of a likelihood in the infinite sites model with non-equilibrium demography.

Highlights

  • The infinite sites model is among the simplest models in population genetics

  • With all mutations occurring at different positions, modeling of genetic variation becomes both mathematically and computationally easier [1]

  • We compared our dynamic programming (DP) method with the genetree method proposed by Griffiths and Tavaré [7], which is based on importance sampling

Read more

Summary

Introduction

The infinite sites model is among the simplest models in population genetics. Polymorphism is assumed to arise by single mutations of unique sites along a stretch of DNA. Population sizes may vary with time, and scaled mutation rate will vary This leads to a time dependent parameter θ(t) = 4N (t)μ, and the distribution of the data will deviate from that under neutral equilibrium. Wu [8] showed that a dynamic programming algorithm can speed up summation over all possible genealogies to make exact calculations feasible for larger datasets. None of these approaches, allow for inference in the presence of time-dependent variation of population sizes or mutation rates.

Dynamic Programming Algorithms for Estimating θ
Basic Probability Model
Efficient Likelihood Computation
Example
Calculating the Likelihood for Time-Independent θ
Calculating the Likelihood for Time-Dependent θ
4: Termination
Simulations
Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.