CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction

Carlos Guzman,Carlos Guzman,Iván D'Orso

doi:10.1186/s12859-017-1770-1

Carlos Guzman, Carlos Guzman + Show 1 more

Open Access

https://doi.org/10.1186/s12859-017-1770-1

Copy DOI

Abstract

BackgroundNext-generation sequencing (NGS) approaches are commonly used to identify key regulatory networks that drive transcriptional programs. Although these technologies are frequently used in biological studies, NGS data analysis remains a challenging, time-consuming, and often irreproducible process. Therefore, there is a need for a comprehensive and flexible workflow platform that can accelerate data processing and analysis so more time can be spent on functional studies.ResultsWe have developed an integrative, stand-alone workflow platform, named CIPHER, for the systematic analysis of several commonly used NGS datasets including ChIP-seq, RNA-seq, MNase-seq, DNase-seq, GRO-seq, and ATAC-seq data. CIPHER implements various open source software packages, in-house scripts, and Docker containers to analyze and process single-ended and pair-ended datasets. CIPHER’s pipelines conduct extensive quality and contamination control checks, as well as comprehensive downstream analysis. A typical CIPHER workflow includes: (1) raw sequence evaluation, (2) read trimming and adapter removal, (3) read mapping and quality filtering, (4) visualization track generation, and (5) extensive quality control assessment. Furthermore, CIPHER conducts downstream analysis such as: narrow and broad peak calling, peak annotation, and motif identification for ChIP-seq, differential gene expression analysis for RNA-seq, nucleosome positioning for MNase-seq, DNase hypersensitive site mapping, site annotation and motif identification for DNase-seq, analysis of nascent transcription from Global-Run On (GRO-seq) data, and characterization of chromatin accessibility from ATAC-seq datasets. In addition, CIPHER contains an “analysis” mode that completes complex bioinformatics tasks such as enhancer discovery and provides functions to integrate various datasets together.ConclusionsUsing public and simulated data, we demonstrate that CIPHER is an efficient and comprehensive workflow platform that can analyze several NGS datasets commonly used in genome biology studies. Additionally, CIPHER’s integrative “analysis” mode allows researchers to elicit important biological information from the combined dataset analysis.

Highlights

Next-generation sequencing (NGS) approaches are commonly used to identify key regulatory networks that drive transcriptional programs
Next-generation sequencing (NGS) technologies are powerful, and widely applied tools to map the in vivo genome-wide location of transcription factors (TFs), histone modifications, nascent transcription, nucleosome positioning, and chromatin accessibility features that make up these regulatory networks
We demonstrate that CIPHER is a fast, reproducible, and flexible tool that accurately processes and integrates NGS datasets by recreating the results of two published studies, and comparing CIPHER’s speed and ease of use to two other chromatin immunoprecipitation (ChIP)-seq and RNA sequencing (RNA-seq) pipelines

Summary

Results

To validate CIPHER’s potential in NGS data analysis, we used data from the Gene Expression Omnibus repository (GEO) to re-create two previously published studies: a ChIP-seq study from McNamara et al [42] and a GRO-seq study from Liu et al [43]. This analysis revealed accessible chromatin at the center of all predicted enhancers as shown by DNase-seq, and chromatin signatures surrounding the nucleosome free region (NFR) in a ‘peak-valley-peak’ pattern that is consistent with traditional enhancer signatures [55] (Fig. 6d) While both active and primed enhancers contained comparable levels of H3K4me, active enhancers contained larger H3K27ac levels (average coverage: 0.72 versus 0.099), and stronger eRNA sense (6 versus 1) and anti-sense (5 versus 1) read coverage compared with primed enhancers, consistent with increased enhancer activity (Fig. 7a and b). Using CIPHER in combination with our previous stringent cut-off, we predicted enhancers in other cell lines: 38,045 active and 10,600 primed enhancers in HeLa (Fig. 6e and f), and 38,551 active and 2292 primed enhancers in K562 cells (data not shown) These results demonstrate that our enhancer-recognition model can reliably detect enhancer elements using ChIPseq and DNase-seq datasets in a broad range of cell lines

Conclusions

Background

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC bioinformatics	Publication Date: Aug 8, 2017
Citations: 29	License type: open-access

R Discovery Prime

R Discovery Prime

CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

An integrative analysis of ATAC-seq and RNA-seq data in activated, CD4+CD45RO+CD196+ human T cells treated with IL-1B and IL-23 with or without PGE2
Ting Wang ... Richard H Duerr
-
Ting Wang, et. al.Ting Wang ... Richard H Duerr
02 Oct 2016
02 Oct 2016

Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects
Koen Van Den Berge ... Sandrine Dudoit
Cell reports methods | VOL. 2
Koen Van Den Berge, et. al.Koen Van Den Berge ... Sandrine Dudoit
01 Nov 2022
Cell reports methods | VOL. 2

ATACseqQC: a Bioconductor package for post-alignment quality assessment of ATAC-seq data
Jianhong Ou ... Michelle A Kelliher
BMC Genomics | VOL. 19
Jianhong Ou, et. al.Jianhong Ou ... Michelle A Kelliher
01 Mar 2018
BMC Genomics | VOL. 19

Read trimming is not required for mapping and quantification of RNA-seq reads at the gene level.
Yang Liao ... Wei Shi
NAR genomics and bioinformatics | VOL. 2
Yang Liao, et. al.Yang Liao ... Wei Shi
01 Sep 2020
NAR genomics and bioinformatics | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics