Abstract

Single-cell RNA-seq data allows insight into normal cellular function and various disease states through molecular characterization of gene expression on the single cell level. Dimensionality reduction of such high-dimensional data sets is essential for visualization and analysis, but single-cell RNA-seq data are challenging for classical dimensionality-reduction methods because of the prevalence of dropout events, which lead to zero-inflated data. Here, we develop a dimensionality-reduction method, (Z)ero (I)nflated (F)actor (A)nalysis (ZIFA), which explicitly models the dropout characteristics, and show that it improves modeling accuracy on simulated and biological data sets.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-015-0805-z) contains supplementary material, which is available to authorized users.

Highlights

  • Single-cell RNA expression analysis is revolutionizing whole-organism science [1, 2] allowing the unbiased identification of previously uncharacterized molecular heterogeneity at the cellular level

  • We focus on the impact of dropout events on the output of dimensionality-reduction algorithms and propose a novel extension of the framework of probabilistic principal components analysis (PPCA) [9] or Factor analysis (FA) to account for these events

  • Simulation study We tested the relative performance of zero-inflated factor analysis (ZIFA) against PCA, PPCA [9], FA and, for reference, non-linear techniques including stochastic neighbor embedding (t-SNE) [11], Isomap [12] and multidimensional scaling [13]

Read more

Summary

Introduction

Single-cell RNA expression analysis (scRNA-seq) is revolutionizing whole-organism science [1, 2] allowing the unbiased identification of previously uncharacterized molecular heterogeneity at the cellular level. Statistical analysis of single-cell gene expression profiles can highlight putative cellular subtypes, delineating subgroups of T cells [3], lung cells [4] and myoblasts [5]. These subgroups can be clinically relevant: for example, individual brain tumors contain cells from multiple types of brain cancers, and greater tumor heterogeneity is associated with worse prognosis [6]. Single-cell gene expression data contain an abundance of dropout events that lead to zero expression measurements. It has not been possible to ascertain fully the ramifications of applying dimensionality-reduction techniques, such as principal components analysis (PCA), to zero-inflated data

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.