Abstract
Large experimental efforts are characterizing the regulatory genome, yet we are still missing a systematic definition of functional and silent genetic variants in non-coding regions. Here, we integrated DNaseI footprinting data with sequence-based transcription factor (TF) motif models to predict the impact of a genetic variant on TF binding across 153 tissues and 1,372 TF motifs. Each annotation we derived is specific for a cell-type condition or assay and is locally motif-driven. We found 5.8 million genetic variants in footprints, 66% of which are predicted by our model to affect TF binding. Comprehensive examination using allele-specific hypersensitivity (ASH) reveals that only the latter group consistently shows evidence for ASH (3,217 SNPs at 20% FDR), suggesting that most (97%) genetic variants in footprinted regulatory regions are indeed silent. Combining this information with GWAS data reveals that our annotation helps in computationally fine-mapping 86 SNPs in GWAS hit regions with at least a 2-fold increase in the posterior odds of picking the causal SNP. The rich meta information provided by the tissue-specificity and the identity of the putative TF binding site being affected also helps in identifying the underlying mechanism supporting the association. As an example, the enrichment for LDL level-associated SNPs is 9.1-fold higher among SNPs predicted to affect HNF4 binding sites than in a background model already including tissue-specific annotation.
Highlights
Despite large ongoing efforts to characterize regulatory regions in the human genome (e.g., ENCODE [1], Roadmap Epigenomics [2]), the lack of a regulatory genetic code to discriminate functional from silent non-coding variants in regulatory sequences poses severe limitations in interpreting the results of many human and population genetic analyses
Many experimental efforts have been dedicated to mapping regulatory regions in the genome but there are not many systematic methods that integrate functional data and regulatory sequences to predict the potential effect of any genetic variant on any given tissue and motif
Large numbers of genetic variants associated with disease and normal trait variation have been identified through genome-wide association studies (GWAS) [3]; yet a formidable challenge remains in determining the specific molecular mechanisms underlying association signals in non-coding regions
Summary
Despite large ongoing efforts to characterize regulatory regions in the human genome (e.g., ENCODE [1], Roadmap Epigenomics [2]), the lack of a regulatory genetic code to discriminate functional from silent non-coding variants in regulatory sequences poses severe limitations in interpreting the results of many human and population genetic analyses. Similar challenges arise when exploring the evolutionary functional significance of non-coding variants, for example through analysis of differences in genotype distribution across populations [4, 5]. This is complicated by the fact that GWAS hits and signals of selection are usually found in large regions of association and do not directly pinpoint the true causative variants. The consequence is that we cannot provide a satisfactory answer to the following questions: Which genetic variants are more likely to impact binding of specific TFs? The consequence is that we cannot provide a satisfactory answer to the following questions: Which genetic variants are more likely to impact binding of specific TFs? What is the fraction of genetic variants in regulatory regions that are not neutral? If we can adequately answer these questions, we may further ask: Did polygenic adaptation occur at binding sites for the same TF? Do variants in certain types of TF footprints and tissues contribute to variation in specific complex traits?
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.