Abstract

Dimensionality reduction of high-dimensional data is crucial for single-cell RNA sequencing (scRNA-seq) visualization and clustering. One prominent challenge in scRNA-seq studies comes from the dropout events, which lead to zero-inflated data. To address this issue, in this paper, we propose a scRNA-seq data dimensionality reduction algorithm based on a hierarchical autoencoder, termed SCDRHA. The proposed SCDRHA consists of two core modules, where the first module is a deep count autoencoder (DCA) that is used to denoise data, and the second module is a graph autoencoder that projects the data into a low-dimensional space. Experimental results demonstrate that SCDRHA has better performance than existing state-of-the-art algorithms on dimension reduction and noise reduction in five real scRNA-seq datasets. Besides, SCDRHA can also dramatically improve the performance of data visualization and cell clustering.

Highlights

  • With the rapid development of single-cell RNA sequencing technology, the research of transcriptomics has changed dramatically (Tang et al, 2013; Xi et al, 2018, 2020)

  • To assess the performance of SCDRHA, we focus on relatively large datasets; five real scRNA-seq datasets with known cell types are selected

  • (i) The 10X PBMC (Zheng et al, 2017) dataset is provided by the 10X scRNA-seq platform, which is from a healthy human.1 (ii) The Mouse ES cell (Klein et al, 2015) dataset profiles the transcriptome of the heterogeneous onset of differentiation of mouse embryonic stem cells after Leukemia Inhibitory Factor (LIF) (Klein et al, 2015) withdrawal GSE65525. (iii) The Mouse bladder cell (Han et al, 2018) dataset is from the Mouse Cell Atlas project GSE108097

Read more

Summary

Introduction

With the rapid development of single-cell RNA sequencing (scRNA-seq) technology, the research of transcriptomics has changed dramatically (Tang et al, 2013; Xi et al, 2018, 2020). The scale of scRNA-seq data obtained by researchers is growing, which brings enormous challenges in analysis and computation (Kiselev et al, 2019; Yu et al, 2021). One of the most challenging noises is the dropout events, which caused zero inflation in scRNA-seq data (Zhang and Zhang, 2018). The low RNA capture rate leads to the detection failure of an expressed gene resulting in a “false” zero count observation, which is defined as a dropout event. The zero counts consist of “false” zero counts and “true” zero counts, where the true counts represent the lack of expression of a gene in a specific cell, and the false zero counts are dropout

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call