Assessing transcriptomic heterogeneity of single-cell RNASeq data by bulk-level gene expression data

Khong-Loon Tiong,Dmytro Luzhbin,Chen-Hsiang Yeang

doi:10.1186/s12859-024-05825-3

Abstract

BackgroundSingle-cell RNA sequencing (sc-RNASeq) data illuminate transcriptomic heterogeneity but also possess a high level of noise, abundant missing entries and sometimes inadequate or no cell type annotations at all. Bulk-level gene expression data lack direct information of cell population composition but are more robust and complete and often better annotated. We propose a modeling framework to integrate bulk-level and single-cell RNASeq data to address the deficiencies and leverage the mutual strengths of each type of data and enable a more comprehensive inference of their transcriptomic heterogeneity. Contrary to the standard approaches of factorizing the bulk-level data with one algorithm and (for some methods) treating single-cell RNASeq data as references to decompose bulk-level data, we employed multiple deconvolution algorithms to factorize the bulk-level data, constructed the probabilistic graphical models of cell-level gene expressions from the decomposition outcomes, and compared the log-likelihood scores of these models in single-cell data. We term this framework backward deconvolution as inference operates from coarse-grained bulk-level data to fine-grained single-cell data. As the abundant missing entries in sc-RNASeq data have a significant effect on log-likelihood scores, we also developed a criterion for inclusion or exclusion of zero entries in log-likelihood score computation.ResultsWe selected nine deconvolution algorithms and validated backward deconvolution in five datasets. In the in-silico mixtures of mouse sc-RNASeq data, the log-likelihood scores of the deconvolution algorithms were strongly anticorrelated with their errors of mixture coefficients and cell type specific gene expression signatures. In the true bulk-level mouse data, the sample mixture coefficients were unknown but the log-likelihood scores were strongly correlated with accuracy rates of inferred cell types. In the data of autism spectrum disorder (ASD) and normal controls, we found that ASD brains possessed higher fractions of astrocytes and lower fractions of NRGN-expressing neurons than normal controls. In datasets of breast cancer and low-grade gliomas (LGG), we compared the log-likelihood scores of three simple hypotheses about the gene expression patterns of the cell types underlying the tumor subtypes. The model that tumors of each subtype were dominated by one cell type persistently outperformed an alternative model that each cell type had elevated expression in one gene group and tumors were mixtures of those cell types. Superiority of the former model is also supported by comparing the real breast cancer sc-RNASeq clusters with those generated by simulated sc-RNASeq data.ConclusionsThe results indicate that backward deconvolution serves as a sensible model selection tool for deconvolution algorithms and facilitates discerning hypotheses about cell type compositions underlying heterogeneous specimens such as tumors.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jun 12, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Assessing transcriptomic heterogeneity of single-cell RNASeq data by bulk-level gene expression data

Abstract

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Single-cell co-expression analysis reveals that transcriptional modules are shared across cell types in the brain.
Benjamin D Harris ... Jesse Gillis
Cell Systems | VOL. 12
Benjamin D Harris, et. al.Benjamin D Harris ... Jesse Gillis
10 May 2021
Cell Systems | VOL. 12

Characterization of gene cluster heterogeneity in single-cell transcriptomic data within and across cancer types.
Khong-Loon Tiong ... Yu-Wei Lin
Biology Open | VOL. 11
Khong-Loon Tiong, et. al.Khong-Loon Tiong ... Yu-Wei Lin
15 Jun 2022
Biology Open | VOL. 11

A UNIFIED STATISTICAL FRAMEWORK FOR SINGLE CELL AND BULK RNA SEQUENCING DATA.
Lingxue Zhu ... Bernie Devlin
The Annals of Applied Statistics | VOL. 12
Lingxue Zhu, et. al.Lingxue Zhu ... Bernie Devlin
30 Mar 2017
The Annals of Applied Statistics | VOL. 12

Reconstruction of Cell-type-Specific Interactomes at Single-Cell Resolution.
Shahin Mohammadi ... Jose Davila-Velderrain
Cell Systems | VOL. 9
Shahin Mohammadi, et. al.Shahin Mohammadi ... Jose Davila-Velderrain
27 Nov 2019
Cell Systems | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assessing transcriptomic heterogeneity of single-cell RNASeq data by bulk-level gene expression data

Abstract

Talk to us

Similar Papers

More From: BMC Bioinformatics