Abstract

Single-cell RNA-seq (scRNASeq) has become a powerful technique for measuring the transcriptome of individual cells. Unlike the bulk measurements that average the gene expressions over the individual cells, gene measurements at individual cells can be used to study several different tissues and organs at different stages. Identifying the cell types present in the sample from the single cell transcriptome data is a common goal in many single-cell experiments. Several methods have been developed to do this. However, correctly identifying the true cell types remains a challenge. We present a framework that addresses this problem. Our hypothesis is that the meaningful characteristics of the data will remain despite small perturbations of data. We validate the performance of the proposed method on eight publicly available scRNA-seq datasets with known cell types as well as five simulation datasets with different degrees of the cluster separability. We compare the proposed method with five other existing methods: RaceID, SNN-Cliq, SINCERA, SEURAT, and SC3. The results show that the proposed method performs better than the existing methods.

Highlights

  • Single-cell RNA-seq has become a powerful technique for measuring the transcriptome of individual cells

  • One important analysis on scRNASeq is the identification of cell types that can be achieved by performing an unsupervised clustering method on transcriptome data[13,14,15,16,17,18,19]

  • To measure the level of similarity between the reference labels and the inferred labels that are obtained by each clustering method, we use three different metrics: adjusted rand index (ARI), adjusted mutual information (AMI), and V-measure as explained in the following

Read more

Summary

Introduction

Single-cell RNA-seq (scRNASeq) has become a powerful technique for measuring the transcriptome of individual cells. One important analysis on scRNASeq is the identification of cell types that can be achieved by performing an unsupervised clustering method on transcriptome data[13,14,15,16,17,18,19] Clustering algorithms such as k-means and density-based spatial clustering of applications with noise (DBSCAN)[20] can identify groups of cells given the single-cell gene expression data. The result of the algorithm depends on the number of clusters (in DBSCAN, the maximum distance between the two data points in the same neighborhood should be determined) and the number of runs Another challenge comes from the high dimensionality of data, known as “curse of dimensionality”. We provide a brief overview of the related methods that identify the cell types based on the combination of approaches described above

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call