Abstract

With recent advances in single-cell RNA sequencing, enormous transcriptome datasets have been generated. These datasets have furthered our understanding of cellular heterogeneity and its underlying mechanisms in homogeneous populations. Single-cell RNA sequencing (scRNA-seq) data clustering can group cells belonging to the same cell type based on patterns embedded in gene expression. However, scRNA-seq data are high-dimensional, noisy, and sparse, owing to the limitation of existing scRNA-seq technologies. Traditional clustering methods are not effective and efficient for high-dimensional and sparse matrix computations. Therefore, several dimension reduction methods have been introduced. To validate a reliable and standard research routine, we conducted a comprehensive review and evaluation of four classical dimension reduction methods and five clustering models. Four experiments were progressively performed on two large scRNA-seq datasets using 20 models. Results showed that the feature selection method contributed positively to high-dimensional and sparse scRNA-seq data. Moreover, feature-extraction methods were able to promote clustering performance, although this was not eternally immutable. Independent component analysis (ICA) performed well in those small compressed feature spaces, whereas principal component analysis was steadier than all the other feature-extraction methods. In addition, ICA was not ideal for fuzzy C-means clustering in scRNA-seq data analysis. K-means clustering was combined with feature-extraction methods to achieve good results.

Highlights

  • Owing to the development of microfluidics, large numbers of cells can be isolated [1]

  • Compared with the high fluctuation of the Independent component analysis (ICA)-based combinations, we discovered that the other eight combinations of hierarchical + principal component analysis (PCA), K-means + PCA, density-based spatial clustering of applications with noise (DBSCAN) + PCA, Louvain + PCA, hierarchical+ negative matrix factorization (NMF), K-means + NMF, fuzzy C-means + NMF, and Louvain + NMF all achieved red promotion (Figure 6)

  • With PCA feature extraction, these three types of cells could be categorized into four groups, asFsihgouwren7.inEfftehcetirveedneosvs aolf ignenFeigsuelreect8iobn. on mouse visual cortex data

Read more

Summary

Introduction

Owing to the development of microfluidics, large numbers of cells can be isolated [1]. Advances in RNA isolation and amplification have resulted in the application of RNA-sequencing (RNA-seq) technology to analyze the transcriptomes of single cells [2,3,4]. Large-scale single-cell data provide new methods to address biological problems; they pose specific analytical and technical challenges, such as high dimensionality, sparse matrix computation, and rare cell type detection [6,7]. The computational analysis of scRNA-seq data involves several steps, including quality control, mapping, quantification, dimensionality reduction, clustering, finding trajectories, and identifying differentially expressed genes [4]. Among these techniques, dimensionality reduction and clustering are two of the most important steps that have substantial effects on downstream analysis

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.