Abstract

Demographic models built from genetic data play important roles in illuminating prehistorical events and serving as null models in genome scans for selection. We introduce an inference method based on the joint frequency spectrum of genetic variants within and between populations. For candidate models we numerically compute the expected spectrum using a diffusion approximation to the one-locus, two-allele Wright-Fisher process, involving up to three simultaneous populations. Our approach is a composite likelihood scheme, since linkage between neutral loci alters the variance but not the expectation of the frequency spectrum. We thus use bootstraps incorporating linkage to estimate uncertainties for parameters and significance values for hypothesis tests. Our method can also incorporate selection on single sites, predicting the joint distribution of selected alleles among populations experiencing a bevy of evolutionary forces, including expansions, contractions, migrations, and admixture. We model human expansion out of Africa and the settlement of the New World, using 5 Mb of noncoding DNA resequenced in 68 individuals from 4 populations (YRI, CHB, CEU, and MXL) by the Environmental Genome Project. We infer divergence between West African and Eurasian populations 140 thousand years ago (95% confidence interval: 40–270 kya). This is earlier than other genetic studies, in part because we incorporate migration. We estimate the European (CEU) and East Asian (CHB) divergence time to be 23 kya (95% c.i.: 17–43 kya), long after archeological evidence places modern humans in Europe. Finally, we estimate divergence between East Asians (CHB) and Mexican-Americans (MXL) of 22 kya (95% c.i.: 16.3–26.9 kya), and our analysis yields no evidence for subsequent migration. Furthermore, combining our demographic model with a previously estimated distribution of selective effects among newly arising amino acid mutations accurately predicts the frequency spectrum of nonsynonymous variants across three continental populations (YRI, CHB, CEU).

Highlights

  • Demographic models inferred from genetic data play several important roles in population genetics

  • Because the diffusion approach neglects linkage, our comparison with the data is through a composite likelihood function. Such likelihoods are consistent estimators under a wide range of population genetic scenarios for selectively-neutral data, but do not correctly capture variances [25]. (Lower recombination induces higher linkage and higher variance in the entries of the allele frequency spectrum (AFS).) As we demonstrate below, the efficiency of our diffusion approach enables both conventional and parametric bootstrap resampling of the data, allowing us to accurately estimate confidence intervals for parameter values and critical values for hypothesis tests [26], accounting for any degree of linkage found in the data

  • Our diffusion approximation to the joint allele frequency spectrum is a powerful tool for population genetic inference

Read more

Summary

Introduction

Demographic models inferred from genetic data play several important roles in population genetics They complement archeological evidence in understanding prehistorical events (such as the number and timing of major continental migrations) which have left no written record [1,2]. Existing demographic models can guide sampling design for subsequent population or medical genetic studies Given their many uses, it is not surprising that many studies have inferred demographic models for populations of humans and other species [4,5,6,7,8,9,10,11,12,13,14,15]. Complex demographic inferences that make use of many summary statistics are often very computationally intensive [8,10,18], which precludes thorough investigation of their statistical properties

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call