Abstract

BackgroundNext-generation sequencing allows genome-wide analysis of changes in chromatin states and gene expression. Data analysis of these increasingly used methods either requires multiple analysis steps, or extensive computational time. We sought to develop a tool for rapid quantification of sequencing peaks from diverse experimental sources and an efficient method to produce coverage tracks for accurate visualization that can be intuitively displayed and interpreted by experimentalists with minimal bioinformatics background. We demonstrate its strength and usability by integrating data from several types of sequencing approaches.ResultsWe have developed BAMscale, a one-step tool that processes a wide set of sequencing datasets. To demonstrate the usefulness of BAMscale, we analyzed multiple sequencing datasets from chromatin immunoprecipitation sequencing data (ChIP-seq), chromatin state change data (assay for transposase-accessible chromatin using sequencing: ATAC-seq, DNA double-strand break mapping sequencing: END-seq), DNA replication data (Okazaki fragments sequencing: OK-seq, nascent-strand sequencing: NS-seq, single-cell replication timing sequencing: scRepli-seq) and RNA-seq data. The outputs consist of raw and normalized peak scores (multiple normalizations) in text format and scaled bigWig coverage tracks that are directly accessible to data visualization programs. BAMScale also includes a visualization module facilitating direct, on-demand quantitative peak comparisons that can be used by experimentalists. Our tool can effectively analyze large sequencing datasets (~ 100 Gb size) in minutes, outperforming currently available tools.ConclusionsBAMscale accurately quantifies and normalizes identified peaks directly from BAM files, and creates coverage tracks for visualization in genome browsers. BAMScale can be implemented for a wide set of methods for calculating coverage tracks, including ChIP-seq and ATAC-seq, as well as methods that currently require specialized, separate tools for analyses, such as splice-aware RNA-seq, END-seq and OK-seq for which no dedicated software is available. BAMscale is freely available on github (https://github.com/ncbi/BAMscale).

Highlights

  • Next-generation sequencing allows genome-wide analysis of changes in chromatin states and gene expression

  • BAMScale modules are available for processing data from Binary simple alignment format (BAM) files generated by standard chromatin analyses such as ChIP-seq and ATAC-seq experiments and contains additional custom functions to process sequencing data from RNA-seq, Okazaki fragment sequencing (OK-seq), replication timing analyses and DNA break mapping (endseq(r), Fig. 1e)

  • Peak quantification and scaling coverage track from ATAC‐seq data To test the capabilities of BAMScale, we first implemented it to compare chromatin accessibility from ATAC-seq data in SLFN11-proficient and deficient cells [31]

Read more

Summary

Introduction

Next-generation sequencing allows genome-wide analysis of changes in chromatin states and gene expression. Improved technologies and decreasing sequencing costs enable in-depth analyses of chromatin and gene expression changes for genome-wide comparisons These integrative multi-omics studies elucidate the Pongor et al Epigenetics & Chromatin (2020) 13:21 functionalities of coding and non-coding parts of the genome, their influence on development of complex disease such as cancers [1,2,3,4] and their translational implications [5,6,7]. Other analyses focus on identifying open-chromatin and DNA-accessible regions [10,11,12,13], which are useful to classify enhancer regions, and transcription factor footprints [14,15,16] Integrating these analyses with gene expression data such as RNAseq [17,18,19], it is possible to gain better understanding of the architecture and regulation of the genome

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call