Abstract
BackgroundSingle-cell RNA sequencing (scRNA-seq) is a powerful profiling technique at the single-cell resolution. Appropriate analysis of scRNA-seq data can characterize molecular heterogeneity and shed light into the underlying cellular process to better understand development and disease mechanisms. The unique analytic challenge is to appropriately model highly over-dispersed scRNA-seq count data with prevalent dropouts (zero counts), making zero-inflated dimensionality reduction techniques popular for scRNA-seq data analyses. Employing zero-inflated distributions, however, may place extra emphasis on zero counts, leading to potential bias when identifying the latent structure of the data.ResultsIn this paper, we propose a fully generative hierarchical gamma-negative binomial (hGNB) model of scRNA-seq data, obviating the need for explicitly modeling zero inflation. At the same time, hGNB can naturally account for covariate effects at both the gene and cell levels to identify complex latent representations of scRNA-seq data, without the need for commonly adopted pre-processing steps such as normalization. Efficient Bayesian model inference is derived by exploiting conditional conjugacy via novel data augmentation techniques.ConclusionExperimental results on both simulated data and several real-world scRNA-seq datasets suggest that hGNB is a powerful tool for cell cluster discovery as well as cell lineage inference.
Highlights
Single-cell RNA sequencing is a powerful profiling technique at the single-cell resolution
J j=1 rj Results We evaluate our hierarchical gamma-negative binomial (hGNB) model on four different sets of real-world scRNA-seq data from different platforms, and compare its performance to those of principal component analysis (PCA), Zero-inflated factor analysis (ZIFA) [7], and ZINB-WaVE [9]
A subset of three Cre lines, including Ntsr1-Cre, Rbp4-Cre, and Scnn1a-Tg3-Cre, that respectively label layer 4, layer 5, Goodness-of-fit of hGNB model We have examined the goodness-of-fit of hGNB model on V1, S1/CA1 and mouse embryonic stem cells (mESCs) datasets, using the meandifference (MD) plots
Summary
Single-cell RNA sequencing (scRNA-seq) is a powerful profiling technique at the single-cell resolution. Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for unbiased identification of previously uncharacterized molecular heterogeneity at the cellular level [1]. This is in contrast to standard bulk RNA-seq techniques [2], which measures average gene expression levels within a cell population, and ignore tissue heterogeneity. A large body of statistical tools developed for scRNAseq data analysis include a dimensionality reduction step. This leads to more tractable data, from both statistical and computational point of views. In addition to high gene expression variability due to cell heterogeneity, the excessive amount of zeros in scRNA-seq hinders
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have