Abstract

Single-cell RNA sequencing (scRNA-seq) is a powerful technique to analyze the transcriptomic heterogeneities at the single cell level. It is an important step for studying cell sub-populations and lineages, with an effective low-dimensional representation and visualization of the original scRNA-Seq data. At the single cell level, the transcriptional fluctuations are much larger than the average of a cell population, and the low amount of RNA transcripts will increase the rate of technical dropout events. Therefore, scRNA-seq data are much noisier than traditional bulk RNA-seq data. In this study, we proposed the deep variational autoencoder for scRNA-seq data (VASC), a deep multi-layer generative model, for the unsupervised dimension reduction and visualization of scRNA-seq data. VASC can explicitly model the dropout events and find the nonlinear hierarchical feature representations of the original data. Tested on over 20 datasets, VASC shows superior performances in most cases and exhibits broader dataset compatibility compared to four state-of-the-art dimension reduction and visualization methods. In addition, VASC provides better representations for very rare cell populations in the 2D visualization. As a case study, VASC successfully re-establishes the cell dynamics in pre-implantation embryos and identifies several candidate marker genes associated with early embryo development. Moreover, VASC also performs well on a 10× Genomics dataset with more cells and higher dropout rate.

Highlights

  • Characterizing the cellular states in single cell level is crucial for understanding the cell-cell heterogeneities and the biological mechanisms not observed by the average behaviors of a bulk of cells

  • VASC, a deep variational autoencoder [9,10,11] based generative model, was designed to find an effective low-dimensional representation and facilitate the visualization of scRNA-seq datasets. It modeled the distribution of high-dimensional original data P(X), by a set of latent variables z

  • The four compared methods and VASC could be broadly divided as two categories: PCA, zero-inflated factor analysis (ZIFA) and VASC aim at finding the representation which can best explain the variations of the original data, while t-SNE and SIMLR try to find another embedded space which can preserve the neighborhood relationship of the samples in the original space

Read more

Summary

Introduction

Characterizing the cellular states in single cell level is crucial for understanding the cell-cell heterogeneities and the biological mechanisms not observed by the average behaviors of a bulk of cells. Thousands of genes are expressed in a single cell at the same time Their expression levels are usually tightly regulated regarding to a limited number of cellular states. The scRNA-seq data have many unexpected dropout events (many data points are zero or near-zero) [5] These noises make those traditional methods work inefficiently. ZIFA can only model linear patterns by a single hidden layer, which limits its performances on the datasets with complex cellular states in the original data space. Another strategy is to embed the cells into another low-dimensional space by preserving the cell-cell similarity (or distance) in the original data space. This kind of methods, such as SIMLR [7], frequently change the basic topological information in the embedded space

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.