Abstract

Background: Identification of cell type subpopulations from complex cell mixtures using single-cell RNA-sequencing (scRNA-seq) data includes automated computational steps like data normalization, dimensionality reduction and cell clustering. However, assigning cell type labels to cell clusters is still conducted manually by most researchers, resulting in limited documentation, low reproducibility and uncontrolled vocabularies. Two bottlenecks to automating this task are the scarcity of reference cell type gene expression signatures and the fact that some dedicated methods are available only as web servers with limited cell type gene expression signatures. Methods: In this study, we benchmarked four methods (CIBERSORT, GSEA, GSVA, and ORA) for the task of assigning cell type labels to cell clusters from scRNA-seq data. We used scRNA-seq datasets from liver, peripheral blood mononuclear cells and retinal neurons for which reference cell type gene expression signatures were available. Results: Our results show that, in general, all four methods show a high performance in the task as evaluated by receiver operating characteristic curve analysis (average area under the curve (AUC) = 0.94, sd = 0.036), whereas precision-recall curve analyses show a wide variation depending on the method and dataset (average AUC = 0.53, sd = 0.24). Conclusions: CIBERSORT and GSVA were the top two performers. Additionally, GSVA was the fastest of the four methods and was more robust in cell type gene expression signature subsampling simulations. We provide an extensible framework to evaluate other methods and datasets at https://github.com/jdime/scRNAseq_cell_cluster_labeling.

Highlights

  • During the last five years a number of single-cell sequencing technologies have been developed to identify cell subpopulations from complex cell mixtures (Bakken et al, 2017)

  • A typical computational pipeline to process scRNA-seq data involves the following steps: i) quality control of sequencing reads, ii) mapping reads against a reference transcriptome, iii) normalization of mapped reads to correct batch effects and remove contaminants, iv) data dimensionality reduction with principal component analysis or alternative approaches, v) clustering of cells using principal component values, vi) detection of genes differentially expressed between clusters, vii) visualization of cell clusters in t-SNE or alternative plots, and viii) assignment of cell type labels to cell clusters

  • The peripheral blood mononuclear cells (PBMCs) cell clusters we obtained with Seurat were mapped using cell barcode identifiers against the fluorescenceactivated cell sorting (FACS) assignments, and cell type names were manually matched to the LM22 signature

Read more

Summary

Introduction

During the last five years a number of single-cell sequencing technologies have been developed to identify cell subpopulations from complex cell mixtures (Bakken et al, 2017). The typical procedure involves manual inspection of the genes expressed in a cluster, combined with a detailed literature search to identify if any of those genes are known gene expression markers for cell types of interest This manual approach has several caveats, including limited documentation and low reproducibility of cell type gene marker selection, use of uncontrolled and non-ontological vocabularies for cell type labels, and it can be time-consuming. For these reasons, computational tools that allow researchers to systematically, reproducibly and quickly assign cell type labels to cell clusters derived from scRNA-seq experiments are needed.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.