Abstract
With the increasing availability and quality of whole genome population data, various methodologies of population genetic inference are being utilized in order to identify and quantify recent population-level selective events. Though there has been a great proliferation of such methodology, the type-I and type-II error rates of many proposed statistics have not been well-described. Moreover, the performance of these statistics is often not evaluated for different biologically relevant scenarios (e.g., population size change, population structure), nor for the effect of differing data sizes (i.e., genomic vs. sub-genomic). The absence of the above information makes it difficult to evaluate newly available statistics relative to one another, and thus, difficult to choose the proper toolset for a given empirical analysis. Thus, we here describe and compare the performance of four widely used tests of selection: SweepFinder, SweeD, OmegaPlus, and iHS. In order to consider the above questions, we utilize simulated data spanning a variety of selection coefficients and beneficial mutation rates. We demonstrate that the LD-based OmegaPlus performs best in terms of power to reject the neutral model under both equilibrium and non-equilibrium conditions—an important result regarding the relative effectiveness of linkage disequilibrium relative to site frequency spectrum based statics. The results presented here ought to serve as a useful guide for future empirical studies, and provides a guide for statistical choice depending on the history of the population under consideration. Moreover, the parameter space investigated and the Type-I and Type-II error rates calculated, represent a natural benchmark by which future statistics may be assessed.
Highlights
Population genetics seeks to characterize the forces that shape genomic variation, an endeavor that is often challenged by difficulties in unraveling the effects of selective and neutral processes
When the probability that a new mutation is affected by selection is increased, this reduces the rejection rate of OmegaPlus, which is consistent with fewer rejections at a lower SNP density (Table 1), and consistent with the poor performance of this linkage disequilibrium (LD)-based approach under recurrent hitchhiking (RHH) models (Jensen et al, 2007)
The weakness of the Sweepfinder class of statistics is their ultimate reliance on a simulated neutral equilibrium model in order to determine significance—in many ways minimizing the benefit of the “background-based” site frequency spectrum (SFS) notion of sweep detection as they again become model-dependent in order to calculate a pvalue
Summary
Population genetics seeks to characterize the forces that shape genomic variation, an endeavor that is often challenged by difficulties in unraveling the effects of selective and neutral processes. The pattern resulting from this process is referred to as a selective sweep, and can be observed in the site frequency spectrum (SFS) and the extent of linkage disequilibrium (LD) flanking the beneficial fixation [see reviews of Nielsen (2005); Crisci et al (2012)]. Genetic variation within a swept region is expected to be reduced, and the SFS skewed toward an excess of both rare and high frequency derived mutations. The haplotype patterns surrounding the beneficial allele are expected to be significantly impacted (e.g., Stephan et al, 2006) as well—and it has been suggested that a selective sweep may be identified by a characteristic haplotype pattern in which LD is increased in regions flanking a recent beneficial fixation, but reduced across the site of fixation (Jensen et al, 2007; Pavlidis et al, 2010). As demonstrated by Barton (1998), the expected coalescent trees generated by a bottleneck may be identical to those generated by selection, and simulation studies have demonstrated that tests of selection are prone to extremely high false positive rates under certain bottleneck models (e.g., Jensen et al, 2005; Thornton and Jensen, 2007)
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have