Abstract
The advent of single-cell chromatin accessibility profiling has accelerated the ability to map gene regulatory landscapes but has outpaced the development of scalable software to rapidly extract biological meaning from these data. Here we present a software suite for single-cell analysis of regulatory chromatin in R (ArchR; https://www.archrproject.com/) that enables fast and comprehensive analysis of single-cell chromatin accessibility data. ArchR provides an intuitive, user-focused interface for complex single-cell analyses, including doublet removal, single-cell clustering and cell type identification, unified peak set generation, cellular trajectory identification, DNA element-to-gene linkage, transcription factor footprinting, mRNA expression level prediction from chromatin accessibility and multi-omic integration with single-cell RNA sequencing (scRNA-seq). Enabling the analysis of over 1.2 million single cells within 8 h on a standard Unix laptop, ArchR is a comprehensive software suite for end-to-end analysis of single-cell chromatin accessibility that will accelerate the understanding of gene regulation at the resolution of individual cells.
Highlights
The advent of single-cell chromatin accessibility profiling has accelerated the ability to map gene regulatory landscapes but has outpaced the development of scalable software to rapidly extract biological meaning from these data
analysis of regulatory chromatin in R (ArchR) takes as input aligned BAM or fragment files, which are first parsed in small chunks per chromosome, read in parallel to conserve memory and efficiently stored on disk using the compressed random-access hierarchical data format version 5 (HDF5) file format (Supplementary Fig. 1a)
Arrow files are grouped into an ‘ArchR Project’, a compressed R data file that is stored in memory, which provides an organized, rapid and low memory-use framework for manipulation of the larger Arrow files stored on disk (Supplementary Fig. 1b)
Summary
The advent of single-cell chromatin accessibility profiling has accelerated the ability to map gene regulatory landscapes but has outpaced the development of scalable software to rapidly extract biological meaning from these data. These advances were driven by an increased interest in chromatin-based gene regulation across a diversity of cellular contexts and biological systems[1,2,5,6,8,9] This capacity for data generation outpaced the development of intuitive, benchmarked and comprehensive software for scATAC-seq analysis[10], a crucial requirement that would facilitate the broad use of these methods for investigating gene regulation at cellular resolution. To this end, we sought to develop a software suite for both routine and advanced analysis of massive-scale single-cell chromatin accessibility data without the need for high-performance computing environments. When compared to other existing tools, such as SnapATAC11 and Signac[12], ArchR provides a more extensive set of features (Extended Data Fig. 1a) and is designed to provide the speed and flexibility to support interactive analysis, enabling iterative extraction of meaningful biological interpretations[11,12,13,14,15,16,17,18,19]
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have