Abstract

ATAC-seq is a widely-applied assay used to measure genome-wide chromatin accessibility; however, its ability to detect active regulatory regions can depend on the depth of sequencing coverage and the signal-to-noise ratio. Here we introduce AtacWorks, a deep learning toolkit to denoise sequencing coverage and identify regulatory peaks at base-pair resolution from low cell count, low-coverage, or low-quality ATAC-seq data. Models trained by AtacWorks can detect peaks from cell types not seen in the training data, and are generalizable across diverse sample preparations and experimental platforms. We demonstrate that AtacWorks enhances the sensitivity of single-cell experiments by producing results on par with those of conventional methods using ~10 times as many cells, and further show that this framework can be adapted to enable cross-modality inference of protein-DNA interactions. Finally, we establish that AtacWorks can enable new biological discoveries by identifying active regulatory regions associated with lineage priming in rare subpopulations of hematopoietic stem cells.

Highlights

  • ATAC-seq is a widely-applied assay used to measure genome-wide chromatin accessibility; its ability to detect active regulatory regions can depend on the depth of sequencing coverage and the signal-to-noise ratio

  • AtacWorks trains a deep neural network to learn a mapping between noisy, low-coverage or low-quality ATAC-seq data and matching high-coverage or high-quality ATAC-seq data from the same cell type

  • The network makes predictions for each base in the genome based on coverage values from a surrounding region spanning several kilobases (6 kb for the models presented here), but does not consider the DNA sequence itself, allowing it to generalize across cell types

Read more

Summary

Introduction

ATAC-seq is a widely-applied assay used to measure genome-wide chromatin accessibility; its ability to detect active regulatory regions can depend on the depth of sequencing coverage and the signal-to-noise ratio. The ability to measure biologically-meaningful changes in accessible chromatin using ATAC-seq depends on both the signal-to-noise ratio and the depth of sequencing coverage Technical parameters such as the overall quality of cells or tissues, the nuclei extraction method[6], or over-digestion of chromatin can result in attenuated measurements of accessibility. An earlier study demonstrated that simple convolutional neural networks can be used to denoise and call peaks from ChIP-seq data, but was optimized for broad peak calling of histone modifications[10] Another recent study applied deep learning to predict chromatin accessibility in a rare pancreatic islet cell type[11], highlighting the need for a robust and generalizable method for the analysis of sparse ATAC-seq data. We apply AtacWorks to single-cell ATAC-seq of hematopoietic stem cells (HSCs) to identify regulatory elements associated with rare lineage-primed subpopulations

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call