Abstract

BackgroundChromatin immunoprecipitation combined with the next-generation DNA sequencing technologies (ChIP-seq) becomes a key approach for detecting genome-wide sets of genomic sites bound by proteins, such as transcription factors (TFs). Several methods and open-source tools have been developed to analyze ChIP-seq data. However, most of them are designed for detecting TF binding regions instead of accurately locating transcription factor binding sites (TFBSs). It is still challenging to pinpoint TFBSs directly from ChIP-seq data, especially in regions with closely spaced binding events.ResultsWith the aim to pinpoint TFBSs at a high resolution, we propose a novel method named SeqSite, implementing a two-step strategy: detecting tag-enriched regions first and pinpointing binding sites in the detected regions. The second step is done by modeling the tag density profile, locating TFBSs on each strand with a least-squares model fitting strategy, and merging the detections from the two strands. Experiments on simulation data show that SeqSite can locate most of the binding sites more than 40-bp from each other. Applications on three human TF ChIP-seq datasets demonstrate the advantage of SeqSite for its higher resolution in pinpointing binding sites compared with existing methods.ConclusionsWe have developed a computational tool named SeqSite, which can pinpoint both closely spaced and isolated binding sites, and consequently improves the resolution of TFBS detection from ChIP-seq data.

Highlights

  • Exploring protein-DNA binding events in a genomewide manner is a key step in studying transcription regulation

  • Characteristics of Chromatin immunoprecipitation (ChIP)-seq data for multiple adjacent transcription factor binding sites (TFBSs) In the ChIP-seq protocol with Illumina Genome Analyzer [14,15], the ChIP-ed DNA fragments are sequenced from either end randomly

  • Since ChIP-ed DNA fragments always cover binding sites, the forward tags are not expected to start from downstream locations of a binding site, and the reverse tags are not expected to start from the upstream regions

Read more

Summary

Introduction

Exploring protein-DNA binding events in a genomewide manner is a key step in studying transcription regulation. Chromatin immunoprecipitation (ChIP) [1] followed by hybridization to DNA tiling arrays (ChIP-chip) [2,3,4] or by next-generation high-throughput sequencing (ChIP-seq) [5,6,7,8,9] are major techniques for experimentally profiling the binding events. Due to many advantages of next-generation sequencing [10,11], ChIP-seq measures immunoprecipitated DNA fragments at a higher signalto-noise ratio than ChIP-chip, and provides the potential to detect protein-DNA binding locations at a higher resolution [9]. Chromatin immunoprecipitation combined with the next-generation DNA sequencing technologies (ChIP-seq) becomes a key approach for detecting genome-wide sets of genomic sites bound by proteins, such as transcription factors (TFs). It is still challenging to pinpoint TFBSs directly from ChIP-seq data, especially in regions with closely spaced binding events

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.