Abstract

The advent of single-cell chromatin accessibility profiling has accelerated the ability to map gene regulatory landscapes but has outpaced the development of scalable software to rapidly extract biological meaning from these data. Here we present a software suite for single-cell analysis of regulatory chromatin in R (ArchR; https://www.archrproject.com/) that enables fast and comprehensive analysis of single-cell chromatin accessibility data. ArchR provides an intuitive, user-focused interface for complex single-cell analyses, including doublet removal, single-cell clustering and cell type identification, unified peak set generation, cellular trajectory identification, DNA element-to-gene linkage, transcription factor footprinting, mRNA expression level prediction from chromatin accessibility and multi-omic integration with single-cell RNA sequencing (scRNA-seq). Enabling the analysis of over 1.2 million single cells within 8 h on a standard Unix laptop, ArchR is a comprehensive software suite for end-to-end analysis of single-cell chromatin accessibility that will accelerate the understanding of gene regulation at the resolution of individual cells.

Highlights

  • The advent of single-cell chromatin accessibility profiling has accelerated the ability to map gene regulatory landscapes but has outpaced the development of scalable software to rapidly extract biological meaning from these data

  • analysis of regulatory chromatin in R (ArchR) takes as input aligned BAM or fragment files, which are first parsed in small chunks per chromosome, read in parallel to conserve memory and efficiently stored on disk using the compressed random-access hierarchical data format version 5 (HDF5) file format (Supplementary Fig. 1a)

  • Arrow files are grouped into an ‘ArchR Project’, a compressed R data file that is stored in memory, which provides an organized, rapid and low memory-use framework for manipulation of the larger Arrow files stored on disk (Supplementary Fig. 1b)

Read more

Summary

Introduction

The advent of single-cell chromatin accessibility profiling has accelerated the ability to map gene regulatory landscapes but has outpaced the development of scalable software to rapidly extract biological meaning from these data. These advances were driven by an increased interest in chromatin-based gene regulation across a diversity of cellular contexts and biological systems[1,2,5,6,8,9] This capacity for data generation outpaced the development of intuitive, benchmarked and comprehensive software for scATAC-seq analysis[10], a crucial requirement that would facilitate the broad use of these methods for investigating gene regulation at cellular resolution. To this end, we sought to develop a software suite for both routine and advanced analysis of massive-scale single-cell chromatin accessibility data without the need for high-performance computing environments. When compared to other existing tools, such as SnapATAC11 and Signac[12], ArchR provides a more extensive set of features (Extended Data Fig. 1a) and is designed to provide the speed and flexibility to support interactive analysis, enabling iterative extraction of meaningful biological interpretations[11,12,13,14,15,16,17,18,19]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.