Abstract

Motivation: The chromatin profile measured by ATAC-seq, ChIP-seq, or DNase-seq experiments can identify genomic regions critical in regulating gene expression and provide insights on biological processes such as diseases and development. However, quality control and processing chromatin profiling data involves many steps, and different bioinformatics tools are used at each step. It can be challenging to manage the analysis. Results: We developed a Snakemake pipeline called CHIPS (CHromatin enrIchment ProcesSor) to streamline the processing of ChIP-seq, ATAC-seq, and DNase-seq data. The pipeline supports single- and paired-end data and is flexible to start with FASTQ or BAM files. It includes basic steps such as read trimming, mapping, and peak calling. In addition, it calculates quality control metrics such as contamination profiles, polymerase chain reaction bottleneck coefficient, the fraction of reads in peaks, percentage of peaks overlapping with the union of public DNaseI hypersensitivity sites, and conservation profile of the peaks. For downstream analysis, it carries out peak annotations, motif finding, and regulatory potential calculation for all genes. The pipeline ensures that the processing is robust and reproducible. Availability: CHIPS is available at https://github.com/liulab-dfci/CHIPS.

Highlights

  • Protein-DNA binding interactions are fundamental to gene regulation and are involved in regulating disease processes

  • Using Snakemake v5.4.5 we developed CHromatin enrIchment ProcesSor (CHIPS) to standardize processing and quality control evaluation for ATAC-seq, ChIP-seq, and DNase-seq data following best practice (Bailey et al 2013)

  • Taken together, CHIPS is a scalable and reproducible pipeline written in Snakemake

Read more

Summary

Introduction

Protein-DNA binding interactions are fundamental to gene regulation and are involved in regulating disease processes. The methods of investigating these interactions through ATAC-seq, ChIP-seq, and DNase-seq experiments generate data that require extensive processing before biological interpretation (Furey 2012). Comprehensive quality control will help to identify failed samples, and robust processing can facilitate reproducible analysis. Using Snakemake v5.4.5 we developed CHromatin enrIchment ProcesSor (CHIPS) to standardize processing and quality control evaluation for ATAC-seq, ChIP-seq, and DNase-seq data following best practice (Bailey et al 2013). CHIPS generates a comprehensive interactive HTML report using Plotly for the users to inspect the quality of the samples. CHIPS has been used to analyze >1500 samples since 2016 within Dana-Farber Cancer Institute, and serves as the standard processing pipeline for tumor ATAC-seq data from the Cancer immune Monitoring and Analysis Centers and Cancer Immunologic Data Commons (CIMACCIDC) trials

Methods
Conclusion
Data availability Underlying data
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call