Abstract

Fast, robust and technology-independent computational methods are needed for supervised cell type annotation of single-cell RNA sequencing data. We present SciBet, a supervised cell type identifier that accurately predicts cell identity for newly sequenced cells with order-of-magnitude speed advantage. We enable web client deployment of SciBet for rapid local computation without uploading local data to the server. Facing the exponential growth in the size of single cell RNA datasets, this user-friendly and cross-platform tool can be widely useful for single cell type identification.

Highlights

  • Fast, robust and technology-independent computational methods are needed for supervised cell type annotation of single-cell RNA sequencing data

  • Because not all genes were useful for such the classification problem[6,8], we developed E-test to select the cell type-specific genes from the training set in a supervised and parametric manner, in order to remove the noisy genes as well as to accelerate the downstream classification by compressing the model

  • We first applied the statistic entropy in information theory to measure the dispersion degree for the Poisson-Gamma-mixture distributed gene expression, and the entropy could be directly estimated by the logarithm of the mean gene expression (Methods)

Read more

Summary

Introduction

Robust and technology-independent computational methods are needed for supervised cell type annotation of single-cell RNA sequencing data. The Human Cell Project[3] (HCA) aims to characterize the single-cell map of all human cells, and its order of magnitude will reach billions Facing such explosive data growth, one major challenge is the reliable and rapid cell type identification given a newly sequenced cell. Supervised cell type annotation of newly-generated data using annotated labels has become more desirable than unsupervised approaches, as unsupervised approaches tend to be far more laborious and computationally intensive Traditional classification methods such as random forest classifier[4] (RF) and support vector machine[5] (SVM) are often time-consuming[6], whereas tools designed for such tasks trade accuracy for speed[6] and integrationoriented tools[7] rely on computation-intensive search of anchor cells. We provide both local and web-based SciBet implementations that are compatible with either existing or custom datasets for ultra-fast and accurate cell type identification

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call