The performance of deep generative models for learning joint embeddings of single-cell multi-omics data.

Eva Brombacher,Clemens Kreutz,Martin Treppner,Harald Binder,Maren Hackenberg

doi:10.3389/fmolb.2022.962644

Eva Brombacher, Clemens Kreutz + Show 3 more

Open Access

https://doi.org/10.3389/fmolb.2022.962644

Copy DOI

Journal: Frontiers in molecular biosciences	Publication Date: Oct 26, 2022
Citations: 11	License type: CC BY 4.0

Affiliation: University of Freiburg

Abstract

Recent extensions of single-cell studies to multiple data modalities raise new questions regarding experimental design. For example, the challenge of sparsity in single-omics data might be partly resolved by compensating for missing information across modalities. In particular, deep learning approaches, such as deep generative models (DGMs), can potentially uncover complex patterns via a joint embedding. Yet, this also raises the question of sample size requirements for identifying such patterns from single-cell multi-omics data. Here, we empirically examine the quality of DGM-based integrations for varying sample sizes. We first review the existing literature and give a short overview of deep learning methods for multi-omics integration. Next, we consider eight popular tools in more detail and examine their robustness to different cell numbers, covering two of the most common multi-omics types currently favored. Specifically, we use data featuring simultaneous gene expression measurements at the RNA level and protein abundance measurements for cell surface proteins (CITE-seq), as well as data where chromatin accessibility and RNA expression are measured in thousands of cells (10x Multiome). We examine the ability of the methods to learn joint embeddings based on biological and technical metrics. Finally, we provide recommendations for the design of multi-omics experiments and discuss potential future developments.

Full Text