ScGCL: an imputation method for scRNA-seq data based on graph contrastive learning.

Zehao Xiong,Bo Wang,Jiawei Luo,Wanwan Shi,Ying Liu,Zhongyuan Xu,Peter Robinson

doi:10.1093/bioinformatics/btad098

Zehao Xiong, Bo Wang + Show 5 more

Open Access

https://doi.org/10.1093/bioinformatics/btad098

Copy DOI

Journal: Bioinformatics	Publication Date: Feb 24, 2023
Citations: 9	License type: CC BY 4.0

Affiliation: Hunan University

Abstract

Single-cell RNA-sequencing (scRNA-seq) is widely used to reveal cellular heterogeneity, complex disease mechanisms and cell differentiation processes. Due to high sparsity and complex gene expression patterns, scRNA-seq data present a large number of dropout events, affecting downstream tasks such as cell clustering and pseudo-time analysis. Restoring the expression levels of genes is essential for reducing technical noise and facilitating downstream analysis. However, existing scRNA-seq data imputation methods ignore the topological structure information of scRNA-seq data and cannot comprehensively utilize the relationships between cells. Here, we propose a single-cell Graph Contrastive Learning method for scRNA-seq data imputation, named scGCL, which integrates graph contrastive learning and Zero-inflated Negative Binomial (ZINB) distribution to estimate dropout values. scGCL summarizes global and local semantic information through contrastive learning and selects positive samples to enhance the representation of target nodes. To capture the global probability distribution, scGCL introduces an autoencoder based on the ZINB distribution, which reconstructs the scRNA-seq data based on the prior distribution. Through extensive experiments, we verify that scGCL outperforms existing state-of-the-art imputation methods in clustering performance and gene imputation on 14 scRNA-seq datasets. Further, we find that scGCL can enhance the expression patterns of specific genes in Alzheimer's disease datasets. The code and data of scGCL are available on Github: https://github.com/zehaoxiong123/scGCL. Supplementary data are available at Bioinformatics online.

Full Text