Abstract

Recent technological breakthroughs have made it possible to measure RNA expression at the single-cell level, thus paving the way for exploring expression heterogeneity among individual cells. Current single-cell RNA sequencing (scRNA-seq) protocols are complex and introduce technical biases that vary across cells, which can bias downstream analysis without proper adjustment. To account for cell-to-cell technical differences, we propose a statistical framework, TASC (Toolkit for Analysis of Single Cell RNA-seq), an empirical Bayes approach to reliably model the cell-specific dropout rates and amplification bias by use of external RNA spike-ins. TASC incorporates the technical parameters, which reflect cell-to-cell batch effects, into a hierarchical mixture model to estimate the biological variance of a gene and detect differentially expressed genes. More importantly, TASC is able to adjust for covariates to further eliminate confounding that may originate from cell size and cell cycle differences. In simulation and real scRNA-seq data, TASC achieves accurate Type I error control and displays competitive sensitivity and improved robustness to batch effects in differential expression analysis, compared to existing methods. TASC is programmed to be computationally efficient, taking advantage of multi-threaded parallelization. We believe that TASC will provide a robust platform for researchers to leverage the power of scRNA-seq.

Highlights

  • Recent technological breakthroughs have made it possible to measure RNA expression at the single-cell level, paving the way for exploring gene expression heterogeneity among individual cells [1,2,3,4]

  • Compared to bulk RNA sequencing, in scRNA-seq the reverse transcription and preamplification steps lead to dropout events and amplification bias, the former describing the scenario in which a transcript expressed in the cell is lost during library preparation and is undetectable at any sequencing depth

  • We evaluate the performance of TASC on both simulated and two real scRNA-seq data sets and compare it with four existing methods, including SCDE [10], MAST [11], and DESeq2 [24], Census [20] and SCRAN [25]

Read more

Summary

Introduction

Recent technological breakthroughs have made it possible to measure RNA expression at the single-cell level, paving the way for exploring gene expression heterogeneity among individual cells [1,2,3,4]. With scRNA-seq data, one can better characterize the phenotypic state of a cell and more accurately describe its lineage and type. Current scRNA-seq protocols are complex, often introducing technical biases that vary across cells [6] (http:// biorxiv.org/content/early/2015/08/25/025528), which, if not properly removed, can lead to severe type I error inflation in differential expression analysis. Compared to bulk RNA sequencing, in scRNA-seq the reverse transcription and preamplification steps lead to dropout events and amplification bias, the former describing the scenario in which a transcript expressed in the cell is lost during library preparation and is undetectable at any sequencing depth. Due to the high prevalence of dropout events in scRNA-seq, it is crucial to account for them in data analysis, especially if conclusions involving low to moderately expressed genes are being drawn [7]. In handling dropout events, existing studies take varying approaches: some ignore dropouts by focusing only on highly expressed genes [8,9], some model dropouts in a cell-specific manner [10,11,12,13], while others use a global zero-inflation parameter to account for dropouts [7]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call