Automated quality control and cell identification of droplet-based single-cell data using dropkick.

Cody N Heiser,Victoria M Wang,Jacob J Hughey,Bob Chen,Ken S Lau

doi:10.1101/gr.271908.120

Cody N Heiser, Victoria M Wang + Show 3 more

Open Access

https://doi.org/10.1101/gr.271908.120

Copy DOI

Abstract

A major challenge for droplet-based single-cell sequencing technologies is distinguishing true cells from uninformative barcodes in data sets with disparate library sizes confounded by high technical noise (i.e., batch-specific ambient RNA). We present dropkick, a fully automated software tool for quality control and filtering of single-cell RNA sequencing (scRNA-seq) data with a focus on excluding ambient barcodes and recovering real cells bordering the quality threshold. By automatically determining data set–specific training labels based on predictive global heuristics, dropkick learns a gene-based representation of real cells and ambient noise, calculating a cell probability score for each barcode. Using simulated and real-world scRNA-seq data, we benchmarked dropkick against conventional thresholding approaches and EmptyDrops, a popular computational method, showing greater recovery of rare cell types and exclusion of empty droplets and noisy, uninformative barcodes. We show for both low- and high-background data sets that dropkick's weakly supervised model reliably learns which genes are enriched in ambient barcodes and draws a multidimensional boundary that is more robust to data set–specific variation than existing filtering approaches. dropkick provides a fast, automated tool for reproducible cell identification from scRNA-seq data that is critical to downstream analysis and compatible with popular single-cell Python packages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Genome research	Publication Date: Apr 9, 2021
Citations: 32	License type: cc-by-nc

R Discovery Prime

R Discovery Prime

Automated quality control and cell identification of droplet-based single-cell data using dropkick.

Abstract

Talk to us

Similar Papers

More From: Genome research

Lead the way for us

Similar Papers

Identifying Genetic Signatures from Single-Cell RNA Sequencing Data by Matrix Imputation and Reduced Set Gene Clustering
Soumita Seth ... Arup Roy
Mathematics | VOL. 11
Soumita Seth, et. al.Soumita Seth ... Arup Roy
17 Oct 2023
Mathematics | VOL. 11

ARGLRR: A Sparse Low-Rank Representation Single-Cell RNA-Sequencing Data Clustering Method Combined with a New Graph Regularization.
Zhen-Chang Wang ... Juan Wang
Journal of computational biology : a journal of computational molecular cell biology | VOL. 30
Zhen-Chang Wang, et. al.Zhen-Chang Wang ... Juan Wang
01 Aug 2023
Journal of computational biology : a journal of computational molecular cell biology | VOL. 30

Effectively Clustering Single Cell RNA Sequencing Data by Sparse Representation.
Rui-Yi Li ... Shuigeng Zhou
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 19
Rui-Yi Li, et. al.Rui-Yi Li ... Shuigeng Zhou
01 Nov 2022
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 19

SNV identification from single-cell RNA sequencing data.
Patricia M Schnepp ... Xiang Zhou
Human Molecular Genetics | VOL. 28
Patricia M Schnepp, et. al.Patricia M Schnepp ... Xiang Zhou
27 Aug 2019
Human Molecular Genetics | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated quality control and cell identification of droplet-based single-cell data using dropkick.

Abstract

Talk to us

Similar Papers

More From: Genome research