Abstract

Identifying and removing multiplets are essential to improving the scalability and the reliability of single cell RNA sequencing (scRNA-seq). Multiplets create artificial cell types in the dataset. We propose a Gaussian mixture model-based multiplet identification method, GMM-Demux. GMM-Demux accurately identifies and removes multiplets through sample barcoding, including cell hashing and MULTI-seq. GMM-Demux uses a droplet formation model to authenticate putative cell types discovered from a scRNA-seq dataset. We generate two in-house cell-hashing datasets and compared GMM-Demux against three state-of-the-art sample barcoding classifiers. We show that GMM-Demux is stable and highly accurate and recognizes 9 multiplet-induced fake cell types in a PBMC dataset.

Highlights

  • Droplet-based single cell RNA sequencing [13, 18, 48] has provided many valuable insights into complex biological systems, such as rare cell-type identification [26, 32, 39, 41], differential expression analysis at the single cell level [2, 5, 9], and cell lineage studies [9, 15, 24, 30]

  • We showed that GMM-Demux accurately and consistently classifies GEMs into single-sample droplets (SSDs) and multisample multiplets (MSMs) and generates more accurate and more consistent results when compared against existing methods

  • We further proposed a GEM formation model to estimate the single-sample multiplets (SSMs) rate in a sample barcoding dataset

Read more

Summary

Introduction

Droplet-based single cell RNA sequencing (scRNA-seq) [13, 18, 48] has provided many valuable insights into complex biological systems, such as rare cell-type identification [26, 32, 39, 41], differential expression analysis at the single cell level [2, 5, 9], and cell lineage studies [9, 15, 24, 30]. While the per-cell cost of library preparation has decreased over the years, the scalability of droplet-based scRNA-seq remains limited, mostly due to rapidly increasing, yet hard to anticipate, multiplet rates as more cells are loaded during single sequencing cell library preparation [17]. Xin et al Genome Biology (2020) 21:188 are especially required for rare cell-type discovery, but loading large cell populations during scRNA-seq library preparation leads to high multiplet rates. The scalability of scRNA-seq can be significantly improved, greatly reducing the per-cell library preparation cost, if multiplets can be identified and removed from downstream analysis. To achieve greater adoption of single cell sequencing technology, it is crucial to (1) identify and remove multiplets from downstream analysis, (2) anticipate the multiplet rate prior to conducting an experiment, and (3) verify whether rare cell types identified from a single cell dataset are authentic and are not multiplets

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.