A scalable approach to topic modelling in single-cell data by approximate pseudobulk projection.

Sishir Subedi,Tomokazu S Sumida,Yongjin P Park

doi:10.26508/lsa.202402713

Abstract

Probabilistic topic modelling has become essential in many types of single-cell data analysis. Based on probabilistic topic assignments in each cell, we identify the latent representation of cellular states. A dictionary matrix, consisting of topic-specific gene frequency vectors, provides interpretable bases to be compared with known cell type-specific marker genes and other pathway annotations. However, fitting a topic model on a large number of cells would require heavy computational resources-specialized computing units, computing time and memory. Here, we present a scalable approximation method customized for single-cell RNA-seq data analysis, termed ASAP, short for Annotating a Single-cell data matrix by Approximate Pseudobulk estimation. Our approach is more accurate than existing methods but requires orders of magnitude less computing time, leaving much lower memory consumption. We also show that our approach is widely applicable for atlas-scale data analysis; our method seamlessly integrates single-cell and bulk data in joint analysis, not requiring additional preprocessing or feature selection steps.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A scalable approach to topic modelling in single-cell data by approximate pseudobulk projection.

Abstract

Talk to us

Similar Papers

More From: Life science alliance

Lead the way for us

Journal: Life science alliance	Publication Date: Aug 6, 2024
License type: CC BY 4.0

Similar Papers

Machine learning development environment for single-cell sequencing data analyses
Lei Jiang ... Juexin Wang
-
Lei Jiang, et. al.Lei Jiang ... Juexin Wang
06 Dec 2022
06 Dec 2022

Single-cell co-expression analysis reveals that transcriptional modules are shared across cell types in the brain.
Benjamin D Harris ... Jesse Gillis
Cell Systems | VOL. 12
Benjamin D Harris, et. al.Benjamin D Harris ... Jesse Gillis
10 May 2021
Cell Systems | VOL. 12

Evaluating the Utilities of Foundation Models in Single-cell Data Analysis.
Tianyu Liu ... Hongyu Zhao
bioRxiv : the preprint server for biology | VOL. -
Tianyu Liu, et. al.Tianyu Liu ... Hongyu Zhao
26 Aug 2024
bioRxiv : the preprint server for biology | VOL. -

Coupled co-clustering-based unsupervised transfer learning for the integrative analysis of single-cell genomic data.
Pengcheng Zeng ... Zhixiang Lin
Briefings in Bioinformatics | VOL. 22
Pengcheng Zeng, et. al.Pengcheng Zeng ... Zhixiang Lin
07 Dec 2020
Briefings in Bioinformatics | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A scalable approach to topic modelling in single-cell data by approximate pseudobulk projection.

Abstract

Talk to us

Similar Papers

More From: Life science alliance