Abstract

The emerging single-cell RNA sequencing (scRNA-seq) technologies enable the investigation of transcriptomic landscapes at the single-cell resolution. ScRNA-seq data analysis is complicated by excess zero counts, the so-called dropouts due to low amounts of mRNA sequenced within individual cells. We introduce scImpute, a statistical method to accurately and robustly impute the dropouts in scRNA-seq data. scImpute automatically identifies likely dropouts, and only perform imputation on these values without introducing new biases to the rest data. scImpute also detects outlier cells and excludes them from imputation. Evaluation based on both simulated and real human and mouse scRNA-seq data suggests that scImpute is an effective tool to recover transcriptome dynamics masked by dropouts. scImpute is shown to identify likely dropouts, enhance the clustering of cell subpopulations, improve the accuracy of differential expression analysis, and aid the study of gene expression dynamics.

Highlights

  • The emerging single-cell RNA sequencing technologies enable the investigation of transcriptomic landscapes at the single-cell resolution

  • We propose a statistical method scImpute to address the dropout events prevalent in scRNA-seq data. scImpute focuses on imputing the missing expression values of dropout genes, while retaining the expression levels of genes that are largely unaffected by dropout events

  • Despite the availability of computational methods that directly model zero-inflation in data[7, 32], scImpute takes the imputation perspective to improve the data quality, and its applicability is not restricted to a specific task. scImpute inputs the raw read count matrix and outputs an imputed count matrix of the same dimensions, so that it can be seamlessly combined with other computational tools without data reformatting or transformation

Read more

Summary

Introduction

The emerging single-cell RNA sequencing (scRNA-seq) technologies enable the investigation of transcriptomic landscapes at the single-cell resolution. One important characteristic of scRNA-seq data is the “dropout” phenomenon where a gene is observed at a moderate expression level in one cell but undetected in another cell[7]. These events occur due to the low amounts of mRNA in individual cells, and a truly expressed transcript may not be detected during sequencing in some cells. This characteristic of scRNA-seq is shown to be protocol-dependent. Statistical or computational methods developed for scRNA-seq need to take the dropout issue into consideration; otherwise, they may present varying efficacy when applied to data generated from different protocols

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call