Abstract

ChIP-Seq has become the standard method for genome-wide profiling DNA association of transcription factors. To simplify analyzing and interpreting ChIP-Seq data, which typically involves using multiple applications, we describe an integrated, open source, R-based analysis pipeline. The pipeline addresses data input, peak detection, sequence and motif analysis, visualization, and data export, and can readily be extended via other R and Bioconductor packages. Using a standard multicore computer, it can be used with datasets consisting of tens of thousands of enriched regions. We demonstrate its effectiveness on published human ChIP-Seq datasets for FOXA1, ER, CTCF and STAT1, where it detected co-occurring motifs that were consistent with the literature but not detected by other methods. Our pipeline provides the first complete set of Bioconductor tools for sequence and motif analysis of ChIP-Seq and ChIP-chip data.

Highlights

  • Transcription factors (TFs) play critical roles in regulating gene expression

  • We applied the pipeline to the four chromatin immunoprecipitation (ChIP)-Seq datasets mentioned above and described in the Methods section

  • The AP-1 complex has been shown to be over-expressed in ER positive cells (e.g. MCF7) and can interact directly with the ER transcription factor [37,38]. This supports the AP-1 motif identified by rGADEM in the ER enriched regions, and the AP-1 motif that we identified in FOXA1-enriched regions, which may reflect interactions, possibly indirect, between the AP-1 and FOXA1 proteins via ER

Read more

Summary

Introduction

Transcription factors (TFs) play critical roles in regulating gene expression. Determining transcription factor binding sites (TFBSs) is challenging because the DNA segments recognized by TFs are often short and dispersed in the genome, and the target loci of a TF vary between tissues, developmental stages and physiological conditions.Genome-wide protein-DNA interactions are typically profiled using ChIP-Seq, i.e. chromatin immunoprecipitation (ChIP) with massively parallel short-read sequencing [1]. Transcription factors (TFs) play critical roles in regulating gene expression. Determining transcription factor binding sites (TFBSs) is challenging because the DNA segments recognized by TFs are often short and dispersed in the genome, and the target loci of a TF vary between tissues, developmental stages and physiological conditions. Genome-wide protein-DNA interactions are typically profiled using ChIP-Seq, i.e. chromatin immunoprecipitation (ChIP) with massively parallel short-read sequencing [1]. A typical ChIP-Seq experiment generates millions of short (35–75 bp) directional DNA sequence reads that represent ends of ,200 bp immunoprecipitated DNA fragments. For experiments with transcription factors, there are three central analysis issues: peakcalling, binding motif identification, and motif interpretation. We report an R/Bioconductor-based pipeline that offers an efficient, integrated set of analysis tools for such experiments

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.