I-Impute: a self-consistent method to impute single cell RNA sequencing data

Xikang Feng,Shuai Cheng Li,Lingxi Chen,Zishuai Wang

doi:10.1186/s12864-020-07007-w

Xikang Feng, Shuai Cheng Li + Show 2 more

Open Access

https://doi.org/10.1186/s12864-020-07007-w

Copy DOI

Abstract

BackgroundSingle-cell RNA-sequencing (scRNA-seq) is becoming indispensable in the study of cell-specific transcriptomes. However, in scRNA-seq techniques, only a small fraction of the genes are captured due to “dropout” events. These dropout events require intensive treatment when analyzing scRNA-seq data. For example, imputation tools have been proposed to estimate dropout events and de-noise data. The performance of these imputation tools are often evaluated, or fine-tuned, using various clustering criteria based on ground-truth cell subgroup labels. This limits their effectiveness in the cases where we lack cell subgroup knowledge. We consider an alternative strategy which requires the imputation to follow a “self-consistency” principle; that is, the imputation process is to refine its results until there is no internal inconsistency or dropouts from the data.ResultsWe propose the use of “self-consistency” as a main criteria in performing imputation. To demonstrate this principle we devised I-Impute, a “self-consistent” method, to impute scRNA-seq data. I-Impute optimizes continuous similarities and dropout probabilities, in iterative refinements until a self-consistent imputation is reached. On the in silico data sets, I-Impute exhibited the highest Pearson correlations for different dropout rates consistently compared with the state-of-art methods SAVER and scImpute. Furthermore, we collected three wetlab datasets, mouse bladder cells dataset, embryonic stem cells dataset, and aortic leukocyte cells dataset, to evaluate the tools. I-Impute exhibited feasible cell subpopulation discovery efficacy on all the three datasets. It achieves the highest clustering accuracy compared with SAVER and scImpute.ConclusionsA strategy based on “self-consistency”, captured through our method, I-Impute, gave imputation results better than the state-of-the-art tools. Source code of I-Impute can be accessed at https://github.com/xikanfeng2/I-Impute.

Highlights

Single-cell RNA-sequencing is becoming indispensable in the study of cell-specific transcriptomes
We first validated whether the existing imputation tools are self-consistent
We found that SAVER and scImpute are not self-consistent. scImpute has root mean square error (RMSE) values of 7.346 at 88.45% dropout data, 0.2392 at 63.29% dropout data, and 0.2677 at 45.16% dropout data

Summary

Introduction

Single-cell RNA-sequencing (scRNA-seq) is becoming indispensable in the study of cell-specific transcriptomes. In scRNA-seq techniques, only a small fraction of the genes are captured due to “dropout” events These dropout events require intensive treatment when analyzing scRNA-seq data. Imputation tools have been proposed to estimate dropout events and de-noise data The performance of these imputation tools are often evaluated, or fine-tuned, using various clustering criteria based on ground-truth cell subgroup labels. Single-cell RNA-sequencing (scRNA-seq) is becoming indispensable in studying the landscapes of cell-specific transcriptomes [1] It demonstrates robust efficacy in capturing transcriptome-wide cell-to-cell heterogeneity with high resolution [2,3,4,5]. ScRNA-seq only captures a small fraction of the genes due to “dropout” events That is, it produces a zero-inflated count matrix where only about 10% entries are non-zero values [17]. The correctness of all these analyses are contingent on the correctness of the expression profile

Methods

Results

Discussion

Conclusion