Genome scans can potentially identify genetic loci involved in evolutionary processes such as local adaptation and gene flow. Here, we show that recombination rate variation across a neutrally evolving genome gives rise to mixed sampling distributions of mean FST ( ), a common population genetic summary statistic. In particular, we show that in regions of low recombination the distribution of estimates has more variance and a longer tail than in more highly recombining regions. Determining outliers from the genome-wide distribution without taking local recombination rate into consideration may therefore increase the frequency of false positives in low recombination regions and be overly conservative in more highly recombining ones. We perform genome scans on simulated and empirical Drosophila melanogaster data sets and, in both cases, find patterns consistent with this neutral model. Similar patterns are observed for other summary statistics used to capture variation in the coalescent process. Linked selection, particularly background selection, is often invoked to explain heterogeneity in across the genome, but here we point out that even under neutrality, statistical artefacts can arise due to variation in recombination rate. Our results highlight a flaw in the design of genome-scan studies and suggest that without estimates of local recombination rate, interpreting the genomic landscape of any summary statistic that captures variation in the coalescent process will be very difficult.
Read full abstract