Abstract

Most variants identified in human genome-wide association studies and scans for selection are non-coding. Interpretation of their effects and the way in which they contribute to phenotypic variation and adaptation in human populations is therefore limited by our understanding of gene regulation and the difficulty of confidently linking non-coding variants to genes. To overcome this, we developed a gene-wise test for population-specific selection based on combinations of regulatory variants. Specifically, we use the QX statistic to test for polygenic selection on cis-regulatory variants based on whether the variance across populations in predicted expression of a particular gene is higher than expected under neutrality. We then applied this approach to human data, testing for selection on 17,388 protein-coding genes in 26 populations from the Thousand Genomes Project. We identified 45 genes with significant evidence (FDR<0.1) for selection, including FADS1, KHK, SULT1A2, ITGAM, and several genes in the HLA region. We further confirm that these signals correspond to plausible population-level differences in predicted expression. While the small number of significant genes (0.2%) is consistent with most cis-regulatory variation evolving under genetic drift or stabilizing selection, it remains possible that there are effects not captured in this study. Our gene-level QX score is independent of standard genomic tests for selection, and may therefore be useful in combination with traditional selection scans to specifically identify selection on regulatory variation. Overall, our results demonstrate the utility of combining population-level genomic data with functional data to understand the evolution of gene expression.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call