Abstract

Selective sweeps, the genetic footprint of positive selection, have been extensively studied in the past decades, with dozens of methods developed to identify swept regions. However, these methods suffer from both false positive and false negative reports, and the candidates identified with different methods are often inconsistent with each other. We propose that a biological cause of this problem can be population subdivision, and a technical cause can be incomplete, or inaccurate, modeling of the dynamic process associated with sweeps. Here we used simulations to show how these effects interact and potentially cause bias. In particular, we show that sweeps maybe misclassified as either hard or soft, when the true time stage of a sweep and that implied, or pre-supposed, by the model do not match. We call this “temporal misclassification”. Similarly, “spatial misclassification (softening)” can occur when hard sweeps, which are imported by migration into a new subpopulation, are falsely identified as soft. This can easily happen in case of local adaptation, i.e. when the sweeping allele is not under positive selection in the new subpopulation, and the underlying model assumes panmixis instead of substructure. The claim that most sweeps in the evolutionary history of humans were soft, may have to be reconsidered in the light of these findings.

Highlights

  • Methods to detect selective sweeps from population sequence dataSelective sweeps were originally studied in the context of panmictic populations of constant size [1], [2], [3], [4], and later in scenarios of changing population size or population substructure (e.g., [5], [6], [7])

  • A large array of statistical methods is available, and is routinely applied in genome annotation studies, to identify genomic locations which supposedly have experienced recent selective sweeps. These methods can roughly be grouped into three types, or mixtures of them, according to their underlying theoretical considerations: there are statistical tests for sweeps which are based on the site frequency spectrum (SFS), tests which are based on haplotypes and their distribution [10], [11], and tests based on properties of the inferred genealogical tree of a sequence sample [12], [13]

  • We demonstrated that frequency-spectrum-based, haplotype-based and machine learning methods are capable of detecting hard sweeps in subdivided populations, but their power and detection window are reduced in low-migration scenarios

Read more

Summary

Introduction

Methods to detect selective sweeps from population sequence dataSelective sweeps were originally studied in the context of panmictic populations of constant size [1], [2], [3], [4], and later in scenarios of changing population size or population substructure (e.g., [5], [6], [7]). A large array of statistical methods is available, and is routinely applied in genome annotation studies, to identify genomic locations which supposedly have experienced recent selective sweeps (reviewed by [8]). These methods can roughly be grouped into three types, or mixtures of them, according to their underlying theoretical considerations: there are statistical tests for sweeps which are based on the site frequency spectrum (SFS) (see [9]), tests which are based on haplotypes and their distribution [10], [11], and tests based on properties of the inferred genealogical tree of a sequence sample [12], [13]. While robustness with respect to demographic effects is often tested in sweep-detection methods (e.g., [15], [21]), such tests often limited to population size changes, and real population histories may lie outside of the tested parameter space

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call