Abstract

BackgroundSingle nucleotide polymorphisms (SNPs) have been associated with many aspects of human development and disease, and many non-coding SNPs associated with disease risk are presumed to affect gene regulation. We have previously shown that SNPs within transcription factor binding sites can affect transcription factor binding in an allele-specific and heritable manner. However, such analysis has relied on prior whole-genome genotypes provided by large external projects such as HapMap and the 1000 Genomes Project. This requirement limits the study of allele-specific effects of SNPs in primary patient samples from diseases of interest, where complete genotypes are not readily available.ResultsIn this study, we show that we are able to identify SNPs de novo and accurately from ChIP-seq data generated in the ENCODE Project. Our de novo identified SNPs from ChIP-seq data are highly concordant with published genotypes. Independent experimental verification of more than 100 sites estimates our false discovery rate at less than 5%. Analysis of transcription factor binding at de novo identified SNPs revealed widespread heritable allele-specific binding, confirming previous observations. SNPs identified from ChIP-seq datasets were significantly enriched for disease-associated variants, and we identified dozens of allele-specific binding events in non-coding regions that could distinguish between disease and normal haplotypes.ConclusionsOur approach combines SNP discovery, genotyping and allele-specific analysis, but is selectively focused on functional regulatory elements occupied by transcription factors or epigenetic marks, and will therefore be valuable for identifying the functional regulatory consequences of non-coding SNPs in primary disease samples.

Highlights

  • Single nucleotide polymorphisms (SNPs) have been associated with many aspects of human development and disease, and many non-coding SNPs associated with disease risk are presumed to affect gene regulation

  • SNP discovery from ChIP-seq data We carried out SNP discovery from ChIP-seq data that we generated for the transcription factor CTCF in 10 human cell lines, including six lymphoblastoid cell lines that had been previously sequenced and genotyped by the 1000 Genomes Project, embryonic stem cells (H1 ESCs), vascular endothelial cells (HUVEC), and normal and disease fibroblasts (Table 1) [16,18]

  • SNPs overlapping with 1000 genomes project and novel SNPs are qualitatively similar In order to characterize in more detail the SNPs we discovered de novo from ChIP-seq data, we separated them into SNPs that overlapped with those found by the 1000 Genomes Project Pilot 2 dataset and those that were not found in Pilot 2 and were novel

Read more

Summary

Introduction

Single nucleotide polymorphisms (SNPs) have been associated with many aspects of human development and disease, and many non-coding SNPs associated with disease risk are presumed to affect gene regulation. We have previously shown that SNPs within transcription factor binding sites can affect transcription factor binding in an allele-specific and heritable manner Such analysis has relied on prior whole-genome genotypes provided by large external projects such as HapMap and the 1000 Genomes Project. This requirement limits the study of allele-specific effects of SNPs in primary patient samples from diseases of interest, where complete genotypes are not readily available. GWA studies have identified approximately 7000 SNPs or regions as disease associated [1], in most cases neither the causal SNPs nor the molecular mechanisms are known (http://www.genome.gov/gwastudies/)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call