Abstract

Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) has been successfully used for genome-wide profiling of transcription factor binding sites, histone modifications, and nucleosome occupancy in many model organisms and humans. Because the compact genomes of prokaryotes harbor many binding sites separated by only few base pairs, applications of ChIP-Seq in this domain have not reached their full potential. Applications in prokaryotic genomes are further hampered by the fact that well studied data analysis methods for ChIP-Seq do not result in a resolution required for deciphering the locations of nearby binding events. We generated single-end tag (SET) and paired-end tag (PET) ChIP-Seq data for factor in Escherichia coli (E. coli). Direct comparison of these datasets revealed that although PET assay enables higher resolution identification of binding events, standard ChIP-Seq analysis methods are not equipped to utilize PET-specific features of the data. To address this problem, we developed dPeak as a high resolution binding site identification (deconvolution) algorithm. dPeak implements a probabilistic model that accurately describes ChIP-Seq data generation process for both the SET and PET assays. For SET data, dPeak outperforms or performs comparably to the state-of-the-art high-resolution ChIP-Seq peak deconvolution algorithms such as PICS, GPS, and GEM. When coupled with PET data, dPeak significantly outperforms SET-based analysis with any of the current state-of-the-art methods. Experimental validations of a subset of dPeak predictions from PET ChIP-Seq data indicate that dPeak can estimate locations of binding events with as high as to resolution. Applications of dPeak to ChIP-Seq data in E. coli under aerobic and anaerobic conditions reveal closely located promoters that are differentially occupied and further illustrate the importance of high resolution analysis of ChIP-Seq data.

Highlights

  • Since its introduction, chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) has revolutionized the study of gene regulation

  • Paired-end tag (PET) sequencing is routinely used in other applications of generation sequencing, it has not been much adapted to ChIP-Seq

  • We illustrate both experimentally and computationally that paired-end tag (PET) sequencing significantly improves the resolution of ChIP-Seq experiments and enables ChIP-Seq applications in compact genomes like Escherichia coli (E. coli)

Read more

Summary

Introduction

Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) has revolutionized the study of gene regulation. ChIP-Seq is currently the state-of-the-art method for studying protein-DNA interactions genome-wide and is widely used [1,2,3,4,5]. Sequencing one end and both ends are referred to as singleend tag (SET) and paired-end tag (PET) technologies, respectively (Figure 1A). Standard preprocessing of these data involves mapping reads to a reference genome and retaining the uniquely mapping ones [6,7]. In PET data, start and end positions of each DNA fragment can be obtained by connecting positions of paired reads [8].

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call