Abstract

In most studies aimed at localizing footprints of past selection, outliers at tails of the empirical distribution of a given test statistic are assumed to reflect locus-specific selective forces. Significance cutoffs are subjectively determined, rather than being related to a clear set of hypotheses. Here, we define an empirical p-value for the summary statistic by means of a permutation method that uses the observed SNP structure in the real data. To illustrate the methodology, we applied our approach to a panel of 2.9 million autosomal SNPs identified from re-sequencing a pool of 15 individuals from a brown egg layer line. We scanned the genome for local reductions in heterozygosity, suggestive of selective sweeps. We also employed a modified sliding window approach that accounts for gaps in the sequence and increases scanning resolution by moving the overlapping windows by steps of one SNP only, and suggest to call this a “creeping window” strategy. The approach confirmed selective sweeps in the region of previously described candidate genes, i.e. TSHR, PRL, PRLHR, INSR, LEPR, IGF1, and NRAMP1 when used as positive controls. The genome scan revealed 82 distinct regions with strong evidence of selection (genome-wide p-value<0.001), including genes known to be associated with eggshell structure and immune system such as CALB1 and GAL cluster, respectively. A substantial proportion of signals was found in poor gene content regions including the most extreme signal on chromosome 1. The observation of multiple signals in a highly selected layer line of chicken is consistent with the hypothesis that egg production is a complex trait controlled by many genes.

Highlights

  • The method uses the original allele frequency spectrum of the genome under study to maintain the observed SNP structure for defining an empirical p-value. It assumes a uniform demography across the genome and generates the null distribution based on independence of allele frequency estimates between neighboring SNPs which is violated in a real scenario

  • This bears its own challenges in defining the models appropriately such that they reproduce the full SNP structure in the data set, and even we are not certain it would yield greater sensitivity or specificity in detecting sweeps

  • Our results confirm the presence of selective sweeps in regions of previously described candidate genes, in some cases spanning over intervals of several megabases

Read more

Summary

Introduction

‘Selection signatures’ are defined as regions of the genome that harbour functionally important sequence variants and are or have been under either natural or artificial selection. In a similar study Nielsen et al [23] extended the CLR test to derive the expected background pattern of variability from the data itself, rather than from a population genetic model This approach compares a neutral null model for the evolution of a genomic window with a selective sweep model and can be applied to species having sufficient genome wide SNP data available [7], [21], [24]. It appears that CLR is one of the few metrics that robustly tests the statistical significance of a putative region for the hypothesis of positive selection. In total 132 genes or genomic regions that display patterns of genetic variation consistent with the hypothesis of positive selection are presented, comprising some striking examples of selective sweeps that span over several megabases

Results and Discussion
Conclusions
Materials and Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call