Abstract
Genome-wide association studies (GWAS) have identified a large number of disease-associated SNPs, but in few cases the functional variant and the gene it controls have been identified. To systematically identify candidate regulatory variants, we sequenced ENCODE cell lines and used public ChIP-seq data to look for transcription factors binding preferentially to one allele. We found 9962 candidate regulatory SNPs, of which 16 % were rare and showed evidence of larger functional effect than common ones. Functionally rare variants may explain divergent GWAS results between populations and are candidates for a partial explanation of the missing heritability. The majority of allele-specific variants (96 %) were specific to a cell type. Furthermore, by examining GWAS loci we found >400 allele-specific candidate SNPs, 141 of which were highly relevant in our cell types. Functionally validated SNPs support identification of an SNP in SYNGR1 which may expose to the risk of rheumatoid arthritis and primary biliary cirrhosis, as well as an SNP in the last intron of COG6 exposing to the risk of psoriasis. We propose that by repeating the ChIP-seq experiments of 20 selected transcription factors in three to ten people, the most common polymorphisms can be interrogated for allele-specific binding. Our strategy may help to remove the current bottleneck in functional annotation of the genome.Electronic supplementary materialThe online version of this article (doi:10.1007/s00439-016-1654-x) contains supplementary material, which is available to authorized users.
Highlights
IntroductionScience for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
The number of reads mapping to the G1 and G2 allele was counted at all heterozygous positions, and those with a statistically significant difference in the number of reads were identified after correcting for multiple testing and copy number variation (CNV)
Based on the detailed validations of two single nucleotide polymorphisms (SNP) associated with rheumatoid arthritis/primary biliary cirrhosis and psoriasis, we believe that many others are worth further study
Summary
Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden. Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden. Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden. It is often assumed that the genetic variant with the highest association is functional; this is usually difficult to prove due to linkage disequilibrium (LD) between SNPs Identifying the SNP with the strongest association to gene expression (eSNP) on a haplotype was proposed as a means to finding the variant driving the association to disease, and the NIH started the Genotype Tissue Expression project (GTEX) to correlate a person’s genotype with gene expression in many tissues. ESNPs are generally in LD with other SNPs leaving the question of direct
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have