Abstract

Analysis of population genetic data often includes a search for genomic regions with signs of recent positive selection. One of such approaches involves the concept of extended haplotype homozygosity (EHH) and its associated statistics. These statistics typically require phased haplotypes, and some of them necessitate polarized variants. Here, we unify and extend previously proposed modifications to loosen these requirements. We compare the modified versions with the original ones by measuring the false discovery rate in simulated whole-genome scans and by quantifying the overlap of inferred candidate regions in empirical data. We find that phasing information is indispensable for accurate estimation of within-population statistics (for all but very large samples) and of cross-population statistics for small samples. Ancestry information, in contrast, is of lesser importance for both types of statistic. Our publicly available R package rehh incorporates the modified statistics presented here.

Highlights

  • The ease with which genomic sequences can be obtained contrasts sharply with the challenge of discerning their functional elements

  • We examined the dependence of the three original statistics on the frequency of derived allele ps at focal marker s

  • extended haplotype homozygosity (EHH)-based statistics for unphased or unpolarized data subsamples containing an equal number of the two core alleles, we confirmed that uniHS depends on population frequency of the derived allele and is not an artifact of its sample frequency (Fig 1 of S2 Text, left and middle panel)

Read more

Summary

Introduction

The ease with which genomic sequences can be obtained contrasts sharply with the challenge of discerning their functional elements. We focus on the classic case of detecting recent strong positive selection in the form of a hard selective sweep, i.e., a single new advantageous variant replacing—on its way to fixation—all or most of previous variants [1]. Differential selection across populations can be detected by means of a conceptually simple statistic such as Fst [2] (which compares variant frequencies between populations) but may be corroborated by more sophisticated approaches, including those presented here, which exploit other characteristics of the selection signal. The detection of selection within a single population has proven more challenging with various methods intended to capture a sign of a reduction in genetic variation [3, 4]. Measures of the average sample homozygosity and length of “runs of homozygosity” in individuals can be regarded, in our opinion, as pre-

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call