An Integrated Pipeline for the Genome-Wide Analysis of Transcription Factor Binding Sites from ChIP-Seq

Eloi Mercier,Leping Li,Gordon Robertson,Arnaud Droit,Xuekui Zhang,Raphael Gottardo,Ying Xu

doi:10.1371/journal.pone.0016432

Abstract

ChIP-Seq has become the standard method for genome-wide profiling DNA association of transcription factors. To simplify analyzing and interpreting ChIP-Seq data, which typically involves using multiple applications, we describe an integrated, open source, R-based analysis pipeline. The pipeline addresses data input, peak detection, sequence and motif analysis, visualization, and data export, and can readily be extended via other R and Bioconductor packages. Using a standard multicore computer, it can be used with datasets consisting of tens of thousands of enriched regions. We demonstrate its effectiveness on published human ChIP-Seq datasets for FOXA1, ER, CTCF and STAT1, where it detected co-occurring motifs that were consistent with the literature but not detected by other methods. Our pipeline provides the first complete set of Bioconductor tools for sequence and motif analysis of ChIP-Seq and ChIP-chip data.

Highlights

Transcription factors (TFs) play critical roles in regulating gene expression
We applied the pipeline to the four chromatin immunoprecipitation (ChIP)-Seq datasets mentioned above and described in the Methods section
The AP-1 complex has been shown to be over-expressed in ER positive cells (e.g. MCF7) and can interact directly with the ER transcription factor [37,38]. This supports the AP-1 motif identified by rGADEM in the ER enriched regions, and the AP-1 motif that we identified in FOXA1-enriched regions, which may reflect interactions, possibly indirect, between the AP-1 and FOXA1 proteins via ER

Summary

Introduction

Transcription factors (TFs) play critical roles in regulating gene expression. Determining transcription factor binding sites (TFBSs) is challenging because the DNA segments recognized by TFs are often short and dispersed in the genome, and the target loci of a TF vary between tissues, developmental stages and physiological conditions.Genome-wide protein-DNA interactions are typically profiled using ChIP-Seq, i.e. chromatin immunoprecipitation (ChIP) with massively parallel short-read sequencing [1]. Transcription factors (TFs) play critical roles in regulating gene expression. Determining transcription factor binding sites (TFBSs) is challenging because the DNA segments recognized by TFs are often short and dispersed in the genome, and the target loci of a TF vary between tissues, developmental stages and physiological conditions. Genome-wide protein-DNA interactions are typically profiled using ChIP-Seq, i.e. chromatin immunoprecipitation (ChIP) with massively parallel short-read sequencing [1]. A typical ChIP-Seq experiment generates millions of short (35–75 bp) directional DNA sequence reads that represent ends of ,200 bp immunoprecipitated DNA fragments. For experiments with transcription factors, there are three central analysis issues: peakcalling, binding motif identification, and motif interpretation. We report an R/Bioconductor-based pipeline that offers an efficient, integrated set of analysis tools for such experiments

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Feb 16, 2011
Citations: 112	License type: cc-by

R Discovery Prime

R Discovery Prime

An Integrated Pipeline for the Genome-Wide Analysis of Transcription Factor Binding Sites from ChIP-Seq

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

An integrated software system for analyzing ChIP-chip and ChIP-seq data
Hongkai Ji ... Wenxiu Ma
Nature Biotechnology | VOL. 26
Hongkai Ji, et. al.Hongkai Ji ... Wenxiu Ma
01 Nov 2008
Nature Biotechnology | VOL. 26

AnnotateGenomicRegions: a web application
Heiko Muller ... Gabriele Bucci
EMBnet.journal | VOL. 18
Heiko Muller, et. al.Heiko Muller ... Gabriele Bucci
09 Nov 2012
EMBnet.journal | VOL. 18

Kraken: A set of tools for quality control and analysis of high-throughput sequence data
Matthew P.A Davis ... Anton J Enright
Methods | VOL. 63
Matthew P.A Davis, et. al.Matthew P.A Davis ... Anton J Enright
29 Jun 2013
Methods | VOL. 63

RACS: rapid analysis of ChIP-Seq data for contig based genomes
Alejandro Saettone ... Marcelo Ponce
BMC Bioinformatics | VOL. 20
Alejandro Saettone, et. al.Alejandro Saettone ... Marcelo Ponce
29 Oct 2019
BMC Bioinformatics | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Integrated Pipeline for the Genome-Wide Analysis of Transcription Factor Binding Sites from ChIP-Seq

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE