Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data

J Javier Diaz-Mejia,Sonya A Macparland,Elaine C Meng,Alexander R Pico,Troy Ketela,John H Morris,Trevor J Pugh,Gary D Bader

doi:10.12688/f1000research.18490.2

Abstract

Background: Identification of cell type subpopulations from complex cell mixtures using single-cell RNA-sequencing (scRNA-seq) data includes automated steps from normalization to cell clustering. However, assigning cell type labels to cell clusters is often conducted manually, resulting in limited documentation, low reproducibility and uncontrolled vocabularies. This is partially due to the scarcity of reference cell type signatures and because some methods support limited cell type signatures. Methods: In this study, we benchmarked five methods representing first-generation enrichment analysis (ORA), second-generation approaches (GSEA and GSVA), machine learning tools (CIBERSORT) and network-based neighbor voting (METANEIGHBOR), for the task of assigning cell type labels to cell clusters from scRNA-seq data. We used five scRNA-seq datasets: human liver, 11 Tabula Muris mouse tissues, two human peripheral blood mononuclear cell datasets, and mouse retinal neurons, for which reference cell type signatures were available. The datasets span Drop-seq, 10X Chromium and Seq-Well technologies and range in size from ~3,700 to ~68,000 cells. Results: Our results show that, in general, all five methods perform well in the task as evaluated by receiver operating characteristic curve analysis (average area under the curve (AUC) = 0.91, sd = 0.06), whereas precision-recall analyses show a wide variation depending on the method and dataset (average AUC = 0.53, sd = 0.24). We observed an influence of the number of genes in cell type signatures on performance, with smaller signatures leading more frequently to incorrect results. Conclusions: GSVA was the overall top performer and was more robust in cell type signature subsampling simulations, although different methods performed well using different datasets. METANEIGHBOR and GSVA were the fastest methods. CIBERSORT and METANEIGHBOR were more influenced than the other methods by analyses including only expected cell types. We provide an extensible framework that can be used to evaluate other methods and datasets at https://github.com/jdime/scRNAseq_cell_cluster_labeling.

Highlights

During the last five years a number of single-cell sequencing technologies have been developed to identify cell subpopulations from complex cell mixtures (Bakken et al, 2017)
Our results show that, in general, all five methods perform well in the task as evaluated by receiver operating characteristic curve analysis (average area under the curve (AUC) = 0.91, sd = 0.06), whereas precision-recall analyses show a wide variation depending on the method and dataset
In this study we analysed each of five scRNA-seq datasets with five computational methods that can be used to assign cell type labels to cell clusters based on known gene expression marker lists

Summary

Introduction

During the last five years a number of single-cell sequencing technologies have been developed to identify cell subpopulations from complex cell mixtures (Bakken et al, 2017). The datasets include human liver cells (MacParland et al, 2018); mouse retinal neurons (Shekhar et al, 2016b); the Tabula Muris mouse cell atlas data (Tabula Muris Consortium et al, 2018a), which encompasses 20 tissues of which we used 11 for which cell type signatures were available (Tabula Muris Consortium, 2018b); and human peripheral blood mononuclear cells (PBMCs) mapped using two technologies: 10X Chromium (Zheng et al, 2017a) and Seq-Well (Gierahn et al, 2017a) (Table 1). The datasets span Drop-seq, 10X Chromium and Seq-Well technologies and range in size from ~3,700 to ~68,000 cells

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: F1000Research	Publication Date: Aug 27, 2019
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research

Lead the way for us

Similar Papers

Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data
Lindsay Cowell ... John H Morris
F1000Research | VOL. 8
Lindsay Cowell, et. al.Lindsay Cowell ... John H Morris
17 Aug 2019
F1000Research | VOL. 8

Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data.
J Javier Diaz-Mejia ... Sonya A Macparland
F1000Research | VOL. 8
J Javier Diaz-Mejia, et. al.J Javier Diaz-Mejia ... Sonya A Macparland
14 Oct 2019
F1000Research | VOL. 8

Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data.
J Javier Diaz-Mejia ... Trevor J Pugh
F1000Research | VOL. 8
J Javier Diaz-Mejia, et. al.J Javier Diaz-Mejia ... Trevor J Pugh
15 Mar 2019
F1000Research | VOL. 8

A Regularized Multi-Task Learning Approach for Cell Type Detection in Single-Cell RNA Sequencing Data.
Piu Upadhyay ... Sumanta Ray
Frontiers in genetics | VOL. 13
Piu Upadhyay, et. al.Piu Upadhyay ... Sumanta Ray
13 Apr 2022
Frontiers in genetics | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research