Abstract

Single-cell RNA-Seq (scRNA-seq) is invaluable for studying biological systems. Dimensionality reduction is a crucial step in interpreting the relation between cells in scRNA-seq data. However, current dimensionality reduction methods are often confounded by multiple simultaneous technical and biological variability, result in “crowding” of cells in the center of the latent space, or inadequately capture temporal relationships. Here, we introduce scPhere, a scalable deep generative model to embed cells into low-dimensional hyperspherical or hyperbolic spaces to accurately represent scRNA-seq data. ScPhere addresses multi-level, complex batch factors, facilitates the interactive visualization of large datasets, resolves cell crowding, and uncovers temporal trajectories. We demonstrate scPhere on nine large datasets in complex tissue from human patients or animal development. Our results show how scPhere facilitates the interpretation of scRNA-seq data by generating batch-invariant embeddings to map data from new individuals, identifies cell types affected by biological variables, infers cells’ spatial positions in pre-defined biological specimens, and highlights complex cellular relations.

Highlights

  • Single-cell RNA-Seq is invaluable for studying biological systems

  • Standard variational autoencoders (VAEs) have several shortcomings when modeling and analyzing scRNA-seq data. They assume a multidimensional normal prior for the low-dimensional latent variables, which encourages the low-dimensional representations of all cells to the group in the center of the latent space, even for data consisting of distinct cell types. This is especially true if the model is trained long enough, such that the posterior distributions gradually approximate the prior distribution. (Cell crowding afflicts generalpurpose data visualization tools such as t-stochastic neighborhood embedding (t-SNE)[16], once the large datasets consist of hundreds of thousands of cells17,18.) A second challenge arises from using the cosine to measure the distance between two cells[19,20,21] for very sparse droplet-based scRNA-seq data (>90% genes with zero counts in a typical cell profile)

  • In practice, current applications of VAEs for scRNA-seq data can only handle a single-batch vector, whereas biologically relevant datasets typically have multiple such factors, both technical and biological. Such complex multilevel factors are not well-handled by current batch-correction methods in single-cell genomics, either VAEs or other approaches[5,12,27,28,29,30,31], but addressing them is critical for integration across studies, interpretation of the impact of various factors on cells in complex tissues, and the ultimate assembly of large tissue atlases

Read more

Summary

Introduction

Single-cell RNA-Seq (scRNA-seq) is invaluable for studying biological systems. Dimensionality reduction is a crucial step in interpreting the relation between cells in scRNA-seq data. Deep-learning models[6], especially (variational) autoencoders[7,8,9], have been used for dimensionality reduction prior to visualization or downstream analyses, such as clustering[10,11,12,13,14,15] This leverages their ability to model large-scale high-dimensional data and their flexibility in incorporating different factors, especially batch effects in the modeling framework. Standard variational autoencoders (VAEs) have several shortcomings when modeling and analyzing scRNA-seq data They assume a multidimensional normal prior for the low-dimensional latent variables, which encourages the low-dimensional representations of all cells to the group in the center of the latent space, even for data consisting of distinct cell types. Our model provides enhanced representation, complex batch correction, reference-generation, visualization, and an interpretation tool for single-cell genomics research

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call