Abstract

We develop CellSIUS (Cell Subtype Identification from Upregulated gene Sets) to fill a methodology gap for rare cell population identification for scRNA-seq data. CellSIUS outperforms existing algorithms for specificity and selectivity for rare cell types and their transcriptomic signature identification in synthetic and complex biological data. Characterization of a human pluripotent cell differentiation protocol recapitulating deep-layer corticogenesis using CellSIUS reveals unrecognized complexity in human stem cell-derived cellular populations. CellSIUS enables identification of novel rare cell populations and their signature genes providing the means to study those populations in vitro in light of their role in health and disease.

Highlights

  • Single-cell RNA sequencing enables genome-wide mRNA expression profiling with single-cell granularity

  • Using synthetic and biological datasets containing rare cell populations, we showed that CellSIUS outperforms existing algorithms in both specificity and selectivity for rare cell type and their transcriptomic signature identification

  • Investigation of feature selection and clustering approaches for scRNA-seq data reveals a methodology gap for the detection of rare cell populations To assess and compare the performance of some of the most recent and widely used feature selection and clustering methodologies for scRNA-seq data, we generated a scRNA-seq dataset with known cellular composition generated from mixtures of eight human cell lines

Read more

Summary

Introduction

Single-cell RNA sequencing (scRNA-seq) enables genome-wide mRNA expression profiling with single-cell granularity. Evolving from the first scRNA-seq dataset measuring gene expression from a single mouse blastomere in 2009 [5], scRNA-seq datasets typically include expression profiles of thousands [1,2,3] to more than one million cells [6, 7]. One of the main applications of scRNA-seq is uncovering and characterizing novel and/or rare cell types from complex tissue in health and disease [8,9,10,11,12,13]. From an analytical point of view, the high dimensionality and complexity of scRNA-seq data pose significant challenges. A multitude of computational approaches for the analysis of scRNA-seq

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.