Abstract

Advances in single-cell RNA sequencing (scRNA-Seq) have allowed for comprehensive analyses of single cell data. However, current analyses of scRNA-Seq data usually start from unsupervised clustering or visualization. These methods ignore prior knowledge of transcriptomes and the probable structures of the data. Moreover, cell identification heavily relies on subjective and possibly inaccurate human inspection afterwards. To address these analytical challenges, we developed SCINA (Semi-supervised Category Identification and Assignment), a semi-supervised model that exploits previously established gene signatures using an expectation–maximization (EM) algorithm. SCINA is applicable to scRNA-Seq and flow cytometry/CyTOF data, as well as other data of similar format. We applied SCINA to a wide range of datasets, and showed its accuracy, stability and efficiency, which exceeded most popular unsupervised approaches. SCINA discovered an intermediate stage of oligodendrocytes from mouse brain scRNA-Seq data. SCINA also detected immune cell population changes in cytometry data in a genetically-engineered mouse model. Furthermore, SCINA performed well with bulk gene expression data. Specifically, we identified a new kidney tumor clade with similarity to FH-deficient tumors (FHD), which we refer to as FHD-like tumors (FHDL). Overall, SCINA provides both methodological advances and biological insights from perspectives different from traditional analytical methods.

Highlights

  • Single cell profiling techniques such as single cell sequencing and cytometry are powerful tools for comprehensive and high-resolution characterization of cellular heterogeneities observed in tumors, brain, and other tissues

  • CD4 + T cells need to be identified by the expression of both CD3 and CD4. These signature genes need to be manually weighed when assigning cell types, leading to even more bias and obscurity. (iii) Thirdly, cell type clustering and assignment are split into two stages, where unsupervised cell clustering in the first stage of analysis ignores prior knowledge of major existing cell populations and their transcriptional features. This leads to suboptimal performance especially when new cell types and subtypes are present in the sequenced cell pool, as they cannot be readily differentiated in the results of unsupervised clustering. (iv) Lastly, in addition to studying one experimental condition, researchers often need to assess changes between conditions in terms of the population abundances of different types of cells

  • supervised Category Identification and Assignment (SCINA) is regarded as semi-supervised because the prior knowledge of signature genes is built is regarded as semi-supervised because the prior knowledge oflearning, signaturewhere genesthe is built into the unsupervised estimation process

Read more

Summary

Introduction

Single cell profiling techniques such as single cell sequencing and cytometry are powerful tools for comprehensive and high-resolution characterization of cellular heterogeneities observed in tumors, brain, and other tissues. CD4 + T cells need to be identified by the expression of both CD3 and CD4 These signature genes need to be manually weighed when assigning cell types, leading to even more bias and obscurity. (iii) Thirdly, cell type clustering and assignment are split into two stages, where unsupervised cell clustering in the first stage of analysis ignores prior knowledge of major existing cell populations and their transcriptional features This leads to suboptimal performance especially when new cell types and subtypes are present in the sequenced cell pool, as they cannot be readily differentiated in the results of unsupervised clustering. Such analyses are less amenable to unsupervised approaches, as the cell groups and cell types are defined ad hoc each time, without justification of the reproducibility of their definitions between conditions

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call