Abstract

ChIP-seq has become a major tool for the genome-wide identification of transcription factor binding or histone modification sites. Most peak-calling algorithms require input control datasets to model the occurrence of background reads to account for local sequencing and GC bias. However, the GC-content of reads in Input-seq datasets deviates significantly from that in ChIP-seq datasets. Moreover, we observed that a commonly used peak calling program performed equally well when the use of a simulated uniform background set was compared to an Input-seq dataset. This contradicts the assumption that input control datasets are necessary to fatefully reflect the background read distribution. Because the GC-content of the abundant single reads in ChIP-seq datasets is similar to those of randomly sampled regions we designed a peak-calling algorithm with a background model based on overlapping single reads. The application, OccuPeak, uses the abundant low frequency tags present in each ChIP-seq dataset to model the background, thereby avoiding the need for additional datasets. Analysis of the performance of OccuPeak showed robust model parameters. Its measure of peak significance, the excess ratio, is only dependent on the tag density of a peak and the global noise levels. Compared to the commonly used peak-calling applications MACS and CisGenome, OccuPeak had the highest sensitivity in an enhancer identification benchmark test, and performed similar in an overlap tests of transcription factor occupation with DNase I hypersensitive sites and H3K27ac sites. Moreover, peaks called by OccuPeak were significantly enriched with cardiac disease-associated SNPs. OccuPeak runs as a standalone application and does not require extensive tweaking of parameters, making its use straightforward and user friendly. Availability: http://occupeak.hfrc.nl

Highlights

  • Networks of transcription factors, histone modifications and regulatory DNA elements control the spatio-temporal expression patterns of genes during development and in homeostasis

  • To simplify chromatin immunoprecipitation (ChIP)-seq data analysis for these researchers, we developed OccuPeak to be a stand-alone ChIP-seq peak-calling program with a user-friendly interface that can serve as a basic research tool

  • Only about 1% of the peaks called on ChIP-seq datasets overlap with peaks in Input-seq datasets; this overlap could be halved when reads associated with repeats were excluded

Read more

Summary

Introduction

Histone modifications and regulatory DNA elements control the spatio-temporal expression patterns of genes during development and in homeostasis. To unravel these regulatory networks and their contribution to developmental processes and human disease, it is imperative to identify the positions of transcription factor binding sites and modified histones throughout the genome. ChIP-seq involves cross-linking of DNA and proteins, shearing the cross-linked DNA into fragments and enrichment of DNA bound to the factor-of-interest via immunoprecipitation These DNA fragments are sequenced, after which reads are aligned to a reference genome and the occurrence of DNA tags is counted. ChIP-seq provides a quantitative map of DNA interaction positions for a given transcription factor, co-factor or modified histone

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.