Abstract

ATAC-seq is widely used to measure chromatin accessibility and identify open chromatin regions (OCRs). OCRs usually indicate active regulatory elements in the genome and are directly associated with the gene regulatory network. The identification of differential accessibility regions (DARs) between different biological conditions is critical in determining the differential activity of regulatory elements. Differential analysis of ATAC-seq shares many similarities with differential expression analysis of RNA-seq data. However, the distribution of ATAC-seq signal intensity is different from that of RNA-seq data, and higher sensitivity is required for DARs identification. Many different tools can be used to perform differential analysis of ATAC-seq data, but a comprehensive comparison and benchmarking of these methods is still lacking. Here, we used simulated datasets to systematically measure the sensitivity and specificity of six different methods. We further discussed the statistical and signal density cut-offs in the differential analysis of ATAC-seq by applying them to real data. Batch effects are very common in high-throughput sequencing experiments. We illustrated that batch-effect correction can dramatically improve sensitivity in the differential analysis of ATAC-seq data. Finally, we developed a user-friendly package, BeCorrect, to perform batch effect correction and visualization of corrected ATAC-seq signals in a genome browser.

Highlights

  • ATAC-seq is widely used to measure chromatin accessibility and identify open chromatin regions (OCRs)

  • We further showed that batch effect correction can greatly improve the sensitivity of ATAC-seq data analysis by removing unwanted variations and can help to identify the differential accessibility regions (DARs) with biological meaning

  • We found that edgeR and limma satisfied at least two out of five criteria [Table 1]

Read more

Summary

Introduction

ATAC-seq is widely used to measure chromatin accessibility and identify open chromatin regions (OCRs). A comprehensive comparison of these widely used tools designed for gene expression analysis is needed to evaluate their sensitivity and specificity in the differential analysis of ATAC-seq data and thereby provide users with guidelines on method choice. We constructed a simulated dataset on the basis of the signal distribution of real ATAC-seq data and established a common set of benchmarks to evaluate the sensitivity and specificity of the six methods in different conditions. We further compared the performance of DESeq[2] and edgeR, the two most popular software programs, using real ATAC-seq data and explored the identification of DARs with different p-value and fold-change cut-offs. We further showed that batch effect correction can greatly improve the sensitivity of ATAC-seq data analysis by removing unwanted variations and can help to identify the DARs with biological meaning.

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call