Abstract
A key parameter in population genetics is the scaled mutation rate θ = 4 N μ , where N is the effective haploid population size and μ is the mutation rate per haplotype per generation. While exact likelihood inference is notoriously difficult in population genetics, we propose a novel approach to compute a first order accurate likelihood of θ that is based on dynamic programming under the infinite sites model without recombination. The parameter θ may be either constant, i.e., time-independent, or time-dependent, which allows for changes of demography and deviations from neutral equilibrium. For time-independent θ, the performance is compared to the approach in Griffiths and Tavaré’s work “Simulating Probability Distributions in the Coalescent” (Theor. Popul. Biol. 1994, 46, 131–159) that is based on importance sampling and implemented in the “genetree” program. Roughly, the proposed method is computationally fast when n × θ < 100 , where n is the sample size. For time-dependent θ ( t ) , we analyze a simple demographic model with a single change in θ ( t ) . In this case, the ancestral and current θ need to be estimated, as well as the time of change. To our knowledge, this is the first accurate computation of a likelihood in the infinite sites model with non-equilibrium demography.
Highlights
The infinite sites model is among the simplest models in population genetics
With all mutations occurring at different positions, modeling of genetic variation becomes both mathematically and computationally easier [1]
We compared our dynamic programming (DP) method with the genetree method proposed by Griffiths and Tavaré [7], which is based on importance sampling
Summary
The infinite sites model is among the simplest models in population genetics. Polymorphism is assumed to arise by single mutations of unique sites along a stretch of DNA. Population sizes may vary with time, and scaled mutation rate will vary This leads to a time dependent parameter θ(t) = 4N (t)μ, and the distribution of the data will deviate from that under neutral equilibrium. Wu [8] showed that a dynamic programming algorithm can speed up summation over all possible genealogies to make exact calculations feasible for larger datasets. None of these approaches, allow for inference in the presence of time-dependent variation of population sizes or mutation rates.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.