Abstract

Detecting multiplets in single nucleus (sn)ATAC-seq data is challenging due to data sparsity and limited dynamic range. AMULET (ATAC-seq MULtiplet Estimation Tool) enumerates regions with greater than two uniquely aligned reads across the genome to effectively detect multiplets. We evaluate the method by generating snATAC-seq data in the human blood and pancreatic islet samples. AMULET has high precision, estimated via donor-based multiplexing, and high recall, estimated via simulated multiplets, compared to alternatives and identifies multiplets most effectively when a certain read depth of 25K median valid reads per nucleus is achieved.

Highlights

  • Single nucleus ATAC-seq [1, 2] technology has accelerated the study of epigenetic regulation with single-cell resolution [3, 4]

  • A unified list of these regions across all nuclei is generated and filtered using known repetitive elements (Methods) to quantify the number of occurrences where >2 reads align to a region in a given nucleus (Fig. 1c). It models random occurrences of regions with >2 reads with the Poisson cumulative distribution function. Based on their deviations from the observed Poisson distribution using false discovery rate (FDR), nuclei determined to contain significantly more regions with >2 reads are identified as multiplets (Fig. 1c, an example shown in Additional file 1: Figure S1)

  • AMULET detected Vireo multiplets with higher precision (0.57–0.61) compared to ArchR (0.28); both methods achieved a similar recall, 0.17–0.20 and 0.20, respectively (Fig. 3f). These results suggest that read-count-based AMULET can detect multiplets with high precision and high recall, especially when samples are sequenced deeply (e.g., 20–28K average, 19–28K median valid read pairs for PBMC1 and PBMC2), serving as an effective alternative to simulation-based methods

Read more

Summary

Introduction

Single nucleus ATAC-seq (snATAC-seq) [1, 2] technology has accelerated the study of epigenetic regulation with single-cell resolution [3, 4]. Multiplet detection in snATAC-seq is a distinct computational challenge compared to single-cell RNA-seq assays due to data sparsity and the limited dynamic range of single-cell chromatin accessibility levels (e.g., 0 reads, closed chromatin; 1, open on one parental chromosome; and 2, open on both chromosomes). Current state-of-the-art methods for multiplet detection in snATAC-seq data (i.e., SnapATAC [6] and ArchR [7]) are similar in nature to scRNA-seq multiplet detection methods (e.g., DoubletFinder [8] and Scrublet [9]).

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call