Abstract

BackgroundAdvances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data.ResultsWe designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3.ConclusionThe SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types.

Highlights

  • Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression

  • The results showed that different data preprocessing methods have quite different effects on different clustering algorithms for different types of gene expression data

  • Impact of different preprocessing methods on cell‐type clustering In this study, five commonly used clustering methods were applied to evaluate clustering performance under four of the most commonly used data preprocessing methods with eight frequently used data sets

Read more

Summary

Introduction

Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. Single-cell RNA sequencing (scRNA-seq) has revolutionized traditional transcriptomic studies by extracting the transcriptome information at the resolution of a single cell; this approach is able to detect heterogeneous information that cannot be obtained by sequencing mixed cells and to reveal the genetic structure and gene expression status of a single cell [1,2,3,4,5,6,7] It helps to identify new cell types [8, 9], provides new research ideas and opens up new directions for in-depth research on the occurrence, development mechanisms, diagnosis and treatment of complex diseases [10]. The SC3 algorithm [17] performs cell-type clustering using a strategy combining multiple clustering solutions to generate a consensus result Clustering methods such as tSNE [18] followed by k-means (tSNE + kmeans which was tested in the study [17]) and pcaReduce [19] perform dimensionality reduction before clustering to extract principal components and reduce computational complexity. Great efforts have been made in the development of these clustering algorithms to effectively cluster cell types, the noise caused by artifacts induced by laboratory protocols during single-cell sequencing and the lack of the universality of the clustering algorithms themselves mean that the clustering accuracy is far from sufficient for many practical applications, and there remains a large amount of room for the improvement of clustering models

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call