Abstract

As recombination events are not uniformly distributed along the human genome, the estimation of fine-scale recombination maps, e.g. HapMap Project, has been one of the major research endeavors over the last couple of years. For simulation studies, these estimates provide realistic reference scenarios to design future study and to develop novel methodology. To achieve a feasible framework for the estimation of such recombination maps, existing methodology uses sample probabilities for a two-locus model with recombination, with recent advances allowing for computationally fast implementations. In this work, we extend the existing theoretical framework for the recombination rate estimation to the presence of population substructure. We show under which assumptions the existing methodology can still be applied. We illustrate our extension of the methodology by an extensive simulation study.

Highlights

  • The discovery that recombination events in human genome are not uniformly distributed, but concentrated in specific genomic regions, which are typically referred to as recombination hotspots [1], was one of the driving forces behind the HapMap Project [2]

  • To create recombination hotspots in simulated data, commonly used software tools based on coalescent simulations, e.g. cosi, incorporate recombination rates that vary along the chromosome [4]

  • Relying on first results in [13] and the detailed study of accuracy in [14], we restate that the application of the approximate sampling formula has huge advantages in relation to the computational demanding Monte Carlo based methods, especially for large values of ρ

Read more

Summary

Introduction

The discovery that recombination events in human genome are not uniformly distributed, but concentrated in specific genomic regions, which are typically referred to as recombination hotspots [1], was one of the driving forces behind the HapMap Project [2]. In 2009, Jenkins and Song proposed approaches to calculate the sample probability for a given configuration with an analytic asymptotic formula of order two in the reciprocal recombination rate Their approach was initially intended for an infinite-allele model [12] and later extended to a finite-allele model [13]. LDhat to a so-called ParentIndependent-Mutation (PIM) model and were able to present a different approach to calculate the required two-locus probabilities for LDhat The key aspect they showed was that the calculations were independent from the specific value of the recombination rate. The key result of our work is, that differences between subpopulation frequencies disappear and the diffusion limit has the form as in the panmictic case with rescaled effective population size This implies the possibility to combine subpopulation samples and evaluate the corresponding sample probability with the existing methodology without any additional computation effort. The advantage for the estimation of recombination rates lies in the increased sample size and the more realistic underlying model

Methods
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call