Abstract

BackgroundSingle-cell sequencing technology can address the amount of single-cell library data at the same time and display the heterogeneity of different cells. However, analyzing single-cell data is a computationally challenging problem. Because there are low counts in the gene expression region, it has a high chance of recognizing the non-zero entity as zero, which are called dropout events. At present, the mainstream dropout imputation methods cannot effectively recover the true expression of cells from dropout noise such as DCA, MAGIC, scVI, scImpute and SAVER.ResultsIn this paper, we propose an autoencoder structure network, named GNNImpute. GNNImpute uses graph attention convolution to aggregate multi-level similar cell information and implements convolution operations on non-Euclidean space on scRNA-seq data. Distinct from current imputation tools, GNNImpute can accurately and effectively impute the dropout and reduce dropout noise. We use mean square error (MSE), mean absolute error (MAE), Pearson correlation coefficient (PCC) and Cosine similarity (CS) to measure the performance of different methods with GNNImpute. We analyze four real datasets, and our results show that the GNNImpute achieves 3.0130 MSE, 0.6781 MAE, 0.9073 PCC and 0.9134 CS. Furthermore, we use Adjusted rand index (ARI) and Normalized mutual information (NMI) to measure the clustering effect. The GNNImpute achieves 0.8199 (ARI) and 0.8368 (NMI), respectively.ConclusionsIn this investigation, we propose a single-cell dropout imputation method (GNNImpute), which effectively utilizes shared information for imputing the dropout of scRNA-seq data. We test it with different real datasets and evaluate its effectiveness in MSE, MAE, PCC and CS. The results show that graph attention convolution and autoencoder structure have great potential in single-cell dropout imputation.

Highlights

  • Single-cell sequencing technology can address the amount of singlecell library data at the same time and display the heterogeneity of different cells

  • When a dropout event occurs in any cell, it can be recovered by the gene expression profile of similar cells

  • Robustness analysis under different dropout rates we evaluate the ability of the imputation method for scRNA-seq data under different dropout rates

Read more

Summary

Introduction

Single-cell sequencing technology can address the amount of singlecell library data at the same time and display the heterogeneity of different cells. With the development of single-cell RNA sequencing (scRNA-seq) technology, it provides an easy way to process tens of thousands of single cells in parallel while providing gene expression data with single-cell-level resolution [1,2,3]. The traditional RNA-seq technology cannot address complex tissues or organs at the cellular level because it measures the average expression of thousands of cells at the same time. Different from the traditional RNA-seq technology, scRNA-seq is widely used to study cell analysis, Xu et al BMC Bioinformatics (2021) 22:582 including cell heterogeneity [4], cell subgroups clustering [5, 6] and cell development trajectories [7]. The scRNA-seq technology can produce single-cell-level resolution data. As a result of defects such as low capture rate and low sequencing depth, the sequencing library data contains a lot of noise [9, 10]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call