DEAPLOG: Differential Expression Analysis and Pseudo-Temporal Locating and Ordering of Genes in Single-Cell Transcriptomic Data.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Differential expression analysis constitutes a crucial step in the analysis of single-cell transcriptomic data. Numerous statistical methods have been developed to conduct differential expression analysis by addressing the sparsity or heterogeneity of gene expression. Nevertheless, these approaches often overlook other critical characteristics of single-cell transcriptomic data, such as the high dimensionality of gene expression at the cellular level, which may consequently lead to suboptimal performance. Furthermore, to date, there remains a significant gap in methodologies capable of locating and ordering genes along cell trajectories. Here, we integrate polynomial fitting with hypergeometric testing to develop a new tool, DEAPLOG (Differential Expression Analysis and Pseudo-temporal Locating and Ordering of Genes), leveraging the high-dimensional gene expression characteristics at the cellular level in single-cell transcriptomic data. Benchmarking analyses on synthetic single-cell datasets demonstrate that while DEAPLOG exhibits performance comparable to existing methods on datasets comprising only two cell clusters, it demonstrates superior performance in differential expression analysis when applied to datasets with multiple cell clusters. Furthermore, the applications of DEAPLOG to real single-cell and spatial transcriptomic dataset not only validate its superior performance in terms of accuracy but also computational efficiency. Notably, when applied to single-cell transcriptomic data from the developmental hematopoietic system, DEAPLOG demonstrate precise gene localization and accurate ordering along developmental trajectories. Collectively, these findings establish DEAPLOG as a robust and highly effective tool for single-cell transcriptomic data analysis.

Similar Papers
  • Research Article
  • 10.1158/1557-3265.sabcs24-p2-06-17
Abstract P2-06-17: An Integrated Analysis of Metastatic Gene Co-expression in Triple-Negative Breast Cancer via Bulk and Single-Cell Transcriptome Data
  • Jun 13, 2025
  • Clinical Cancer Research
  • Melody Hong + 1 more

Background: Triple-negative breast cancer (TNBC) is clinically significant for its high invasiveness (metastasis) and poor prognosis. The molecular classification of breast cancers and characterization of TNBC have been aided by the analysis of both bulk and single-cell RNA-sequencing data. Specifically, they have facilitated the study of the roles of various regulator and effector genes associated with pro-metastatic behaviors such as invasion, and drug resistance-related epithelial-to-mesenchymal transition (EMTI). Yet, it remains unknown how informative and consistent bulk and scRNA-seq TNBC data are in regards to co-expressions of EMTI genes.Methods: Bulk TNBC RNA-seq data was retrieved from the TCGA-BRCA cohort of the TCGA-TARGET-GTEx Recompute Project, cleaned, and normalized. Single-nucleus TNBC tumor cell data curated by Pal et al. (2021) was separated into six datasets based on patient(s) of origin/cell cycling status and preprocessed. A list of 363 EMTI genes from the EMTome was defined for all gene-gene correlation analyses. Significantly correlated gene pairs with co-expressions concordant among TNBC dataset pairs and larger groupings were found. To identify possibly TNBC-specific correlations, co-expression analysis was repeated for both bulk normal breast from the GTEx cohort of the Recompute project and snRNA-seq normal breast epithelial data from Pal et al. (2021). EMTI regulatory networks were reconstructed using co-expression data between 26 key EMT genes for both TNBC and normal. Co-expression comparisons and differential co-expression analysis were performed between the TNBC and normal samples. Differential expression analysis was also performed between the bulk TNBC and normal to investigate the relationship between differential EMTI gene expression and co-expression. Results: EMTI gene co-expression patterns exhibited greater similarity between TNBC single-cell datasets than between any single-cell and bulk TNBC. Moreover, the datasets most concordant with each other were single-cell datasets originating from the same patient(s), indicating that EMTI gene co-expressions do not significantly differ with cell cycling status. The SPINT1 and LSR genes were significantly positively correlated in all TNBC samples but not in normal, revealing a previously undocumented link. Additionally, we found 11 TNBC single-cell-specific gene-gene correlations (7 positive, 4 negative), relationships that would not have been identified in heterogeneous bulk TNBC. We observed known interactions between EMTI genes, including an EpCAM-keratin-claudin link well preserved across both TNBC and normal breast. Out of the three breast epithelial lineages (luminal progenitor (LP), mature luminal (ML), basal), EMTI gene co-expressions in TNBC single-cell were most similar to those in LP; however, those in the bulk TNBC were most similar to basal. With the exception of 2 gene pairs common to the bulk datasets, there were no EMTI gene pairs with consistent co-expression patterns across the different TNBC/normal groupings. Interestingly, differential EMTI gene co-expression was weakly associated with differential gene expression; this applied for genes non-differentially expressed (KRT5), upregulated in TNBC (EPCAM), or downregulated (TGFB1I1) (Spearman’s ρ: 0.139, 0.145, 0.0839; all p < 10e-07). Conclusions: Our integrated analysis reveals that EMTI gene pairs are moderately concordant among TNBC or normal datasets, but not between the two. Moreover, we show both known and new interactions in reconstructed networks. We identify a TNBC single-cell-specific 11 gene pair signature that may be promising for further study. Finally, although contrary to previous work we find that differential EMTI gene co-expression can be partially explained by differential gene expression, the association is weak enough where the two remain complementary approaches. Citation Format: Melody Hong, Gábor Balázsi. An Integrated Analysis of Metastatic Gene Co-expression in Triple-Negative Breast Cancer via Bulk and Single-Cell Transcriptome Data [abstract]. In: Proceedings of the San Antonio Breast Cancer Symposium 2024; 2024 Dec 10-13; San Antonio, TX. Philadelphia (PA): AACR; Clin Cancer Res 2025;31(12 Suppl):Abstract nr P2-06-17.

  • Research Article
  • Cite Count Icon 75
  • 10.1016/j.celrep.2022.111697
Systematic single-cell pathway analysis to characterize early Tcell activation.
  • Nov 1, 2022
  • Cell reports
  • Jack A Bibby + 12 more

Systematic single-cell pathway analysis to characterize early Tcell activation.

  • Research Article
  • 10.1016/j.freeradbiomed.2025.08.061
Single-cell, spatial and bulk transcriptome data analysis revealed LINC00467-mediated Sertoli cell ferroptosis is a potential therapeutic target and biomarker for azoospermia.
  • Dec 1, 2025
  • Free radical biology & medicine
  • Hao Bo + 8 more

Single-cell, spatial and bulk transcriptome data analysis revealed LINC00467-mediated Sertoli cell ferroptosis is a potential therapeutic target and biomarker for azoospermia.

  • Research Article
  • 10.1212/wnl.0000000000203799
Biological roles of myeloid cell subsets during CNS inflammation (P2-3.005)
  • Apr 25, 2023
  • Neurology
  • Navid Manouchehri + 3 more

<h3>Objective:</h3> We designed an experimental platform, utilizing single-cell transcriptomic and epigenomic profiling. We employed this system to interrogate single-cell data for subset-specific signatures. <h3>Background:</h3> Myeloid cells (MC) of the central nervous system (CNS) constitute a heterogeneous population that includes A) yolk sac-derived CNS microglia, B) bone marrow-derived resident MC and C) infiltrating bone marrow-derived MC. While bone marrow and yolk sac progenies differ in origin, their protein expression exhibit significant overlap. Current canonical myeloid markers, lack the specificity required for use as disease biomarkers or therapeutic targets. <h3>Design/Methods:</h3> We created chimeras in congenic animals by transplantation of bone marrow from CD45.1<sup>+</sup> donors into sub-lethally irradiated CD45.2<sup>+</sup> recipients. Following active EAE, CD11b<sup>+</sup> MC were collected from CNS at the peak of the disease. Fluorescent-activated cell sorting (FACS) separated CD45.1<sup>+</sup> MC from CD45.2<sup>+</sup> MC. Each sample was analyzed simultaneously for single-cell gene expression (scRNA-seq) and epigenetic signatures (scATAC-seq) via the 10X Genomics Multiomic assay. We employed bioinformatics software (Seurat v4.0) for analysis and visualization of data profiling over 20,000 cells. <h3>Results:</h3> Bone marrow derived (CD45.1+) or microglial and CNS-resident (CD45.2+) MC were both present in the CNS tissue during acute EAE. Analyses of single-cell transcriptomic and ATAC-seq data, independently distinguished common clusters of MC derived entirely from the CD45.2+ cells. Unsupervised annotation, recognized these cells as microglia. CD45.1+ MC and microglia exhibited extensive gene expression overlap; however, differential gene expression analysis and logistic regression modeling yielded a signature set of genes within the microglia accurately distinguished microglia from bone-marrow-derived MC in independent datasets. <h3>Conclusions:</h3> Single-cell data recapitulates the subtle differences in the otherwise comparable gene expression patterns among microglia and bone marrow-derived MC. These patterns shed light on the subset-specific biological roles and may present as superior substitutes for the suboptimal canonical myeloid markers. <b>Disclosure:</b> Navid Manouchehri has nothing to disclose. The institution of an immediate family member of Dr. Salinas has received research support from Veterans Health Administration. Rehana Hussain has nothing to disclose. Dr. Stuve has received personal compensation in the range of $0-$499 for serving on a Scientific Advisory or Data Safety Monitoring board for EMD Serono. Dr. Stuve has received personal compensation in the range of $0-$499 for serving on a Scientific Advisory or Data Safety Monitoring board for Novartis. Dr. Stuve has received personal compensation in the range of $0-$499 for serving on a Scientific Advisory or Data Safety Monitoring board for Roche Genentech. Dr. Stuve has received personal compensation in the range of $0-$499 for serving on a Scientific Advisory or Data Safety Monitoring board for TG Therapeutics. Dr. Stuve has received personal compensation in the range of $500-$4,999 for serving on a Scientific Advisory or Data Safety Monitoring board for TG Therapeutics. Dr. Stuve has received personal compensation in the range of $500-$4,999 for serving on a Scientific Advisory or Data Safety Monitoring board for EMD Serono. Dr. Stuve has received personal compensation in the range of $500-$4,999 for serving on a Scientific Advisory or Data Safety Monitoring board for VYNE. Dr. Stuve has received personal compensation in the range of $5,000-$9,999 for serving as an Editor, Associate Editor, or Editorial Advisory Board Member for Therapeutic Advances in Neurological Diseases. Dr. Stuve has received research support from US Department of Veterans Affairs. Dr. Stuve has received research support from National Multiple Sclerosis Society (US). Dr. Stuve has received research support from Exalys. Dr. Stuve has received research support from Merck KGaA.

  • Research Article
  • Cite Count Icon 22
  • 10.1210/endocr/bqab148
High Coexpression of the Ghrelin and LEAP2 Receptor GHSR With Pancreatic Polypeptide in Mouse and Human Islets.
  • Jul 21, 2021
  • Endocrinology
  • Deepali Gupta + 8 more

Islets represent an important site of direct action of the hormone ghrelin, with expression of the ghrelin receptor (growth hormone secretagogue receptor; GHSR) having been localized variably to alpha cells, beta cells, and/or somatostatin (SST)-secreting delta cells. To our knowledge, GHSR expression by pancreatic polypeptide (PP)-expressing gamma cells has not been specifically investigated. Here, histochemical analyses of Ghsr-IRES-Cre × Cre-dependent ROSA26-yellow fluorescent protein (YFP) reporter mice showed 85% of GHSR-expressing islet cells coexpress PP, 50% coexpress SST, and 47% coexpress PP + SST. Analysis of single-cell transcriptomic data from mouse pancreas revealed 95% of Ghsr-expressing cells coexpress Ppy, 100% coexpress Sst, and 95% coexpress Ppy + Sst. This expression was restricted to gamma-cell and delta-cell clusters. Analysis of several single-cell human pancreatic transcriptome data sets revealed 59% of GHSR-expressing cells coexpress PPY, 95% coexpress SST, and 57% coexpress PPY + SST. This expression was prominent in delta-cell and beta-cell clusters, also occurring in other clusters including gamma cells and alpha cells. GHSR expression levels were upregulated by type 2 diabetes mellitus in beta cells. In mice, plasma PP positively correlated with fat mass and with plasma levels of the endogenous GHSR antagonist/inverse agonist LEAP2. Plasma PP also elevated on LEAP2 and synthetic GHSR antagonist administration. These data suggest that in addition to delta cells, beta cells, and alpha cells, PP-expressing pancreatic cells likely represent important direct targets for LEAP2 and/or ghrelin both in mice and humans.

  • Research Article
  • Cite Count Icon 17
  • 10.1093/bib/bbaa347
Coupled co-clustering-based unsupervised transfer learning for the integrative analysis of single-cell genomic data.
  • Dec 7, 2020
  • Briefings in Bioinformatics
  • Pengcheng Zeng + 2 more

Unsupervised methods, such as clustering methods, are essential to the analysis of single-cell genomic data. The most current clustering methods are designed for one data type only, such as single-cell RNA sequencing (scRNA-seq), single-cell ATAC sequencing (scATAC-seq) or sc-methylation data alone, and a few are developed for the integrative analysis of multiple data types. The integrative analysis of multimodal single-cell genomic data sets leverages the power in multiple data sets and can deepen the biological insight. In this paper, we propose a coupled co-clustering-based unsupervised transfer learning algorithm (coupleCoC) for the integrative analysis of multimodal single-cell data. Our proposed coupleCoC builds upon the information theoretic co-clustering framework. In co-clustering, both the cells and the genomic features are simultaneously clustered. Clustering similar genomic features reduces the noise in single-cell data and facilitates transfer of knowledge across single-cell datasets. We applied coupleCoC for the integrative analysis of scATAC-seq and scRNA-seq data, sc-methylation and scRNA-seq data and scRNA-seq data from mouse and human. We demonstrate that coupleCoC improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. Our method coupleCoC is also computationally efficient and can scale up to large datasets. Availability: The software and datasets are available at https://github.com/cuhklinlab/coupleCoC.

  • Research Article
  • Cite Count Icon 4
  • 10.1093/bib/bbae188
ScBOL: a universal cell type identification framework for single-cell and spatial transcriptomics data.
  • Mar 27, 2024
  • Briefings in Bioinformatics
  • Yuyao Zhai + 2 more

Over the past decade, single-cell transcriptomic technologies have experienced remarkable advancements, enabling the simultaneous profiling of gene expressions across thousands of individual cells. Cell type identification plays an essential role in exploring tissue heterogeneity and characterizing cell state differences. With more and more well-annotated reference data becoming available, massive automatic identification methods have sprung up to simplify the annotation process on unlabeled target data by transferring the cell type knowledge. However, in practice, the target data often include some novel cell types that are not in the reference data. Most existing works usually classify these private cells as one generic 'unassigned' group and learn the features of known and novel cell types in a coupled way. They are susceptible to the potential batch effects and fail to explore the fine-grained semantic knowledge of novel cell types, thus hurting the model's discrimination ability. Additionally, emerging spatial transcriptomic technologies, such as in situ hybridization, sequencing and multiplexed imaging, present a novel challenge to current cell type identification strategies that predominantly neglect spatial organization. Consequently, it is imperative to develop a versatile method that can proficiently annotate single-cell transcriptomics data, encompassing both spatial and non-spatial dimensions. To address these issues, we propose a new, challenging yet realistic task called universal cell type identification for single-cell and spatial transcriptomics data. In this task, we aim to give semantic labels to target cells from known cell types and cluster labels to those from novel ones. To tackle this problem, instead of designing a suboptimal two-stage approach, we propose an end-to-end algorithm called scBOL from the perspective of Bipartite prototype alignment. Firstly, we identify the mutual nearest clusters in reference and target data as their potential common cell types. On this basis, we mine the cycle-consistent semantic anchor cells to build the intrinsic structure association between two data. Secondly, we design a neighbor-aware prototypical learning paradigm to strengthen the inter-cluster separability and intra-cluster compactness within each data, thereby inspiring the discriminative feature representations. Thirdly, driven by the semantic-aware prototypical learning framework, we can align the known cell types and separate the private cell types from them among reference and target data. Such an algorithm can be seamlessly applied to various data types modeled by different foundation models that can generate the embedding features for cells. Specifically, for non-spatial single-cell transcriptomics data, we use the autoencoder neural network to learn latent low-dimensional cell representations, and for spatial single-cell transcriptomics data, we apply the graph convolution network to capture molecular and spatial similarities of cells jointly. Extensive results on our carefully designed evaluation benchmarks demonstrate the superiority of scBOL over various state-of-the-art cell type identification methods. To our knowledge, we are the pioneers in presenting this pragmatic annotation task, as well as in devising a comprehensive algorithmic framework aimed at resolving this challenge across varied types of single-cell data. Finally, scBOL is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/aimeeyaoyao/scBOL.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 24
  • 10.3389/fonc.2021.832315
CA9-Related Acidic Microenvironment Mediates CD8+ T Cell Related Immunosuppression in Pancreatic Cancer
  • Jan 27, 2022
  • Frontiers in Oncology
  • Lingdi Yin + 16 more

PurposeThis study aims to integrate pancreatic cancer TCGA, GEO, and single-cell RNA-sequencing (scRNA-seq) datasets, and explore the potential prognostic markers and underlying mechanisms of the immune microenvironment of pancreatic cancer through bioinformatics methods, in vitro and in vivo assays.MethodsExpression data and clinicopathological data of pancreatic cancer TCGA, GEO (GSE131050), single cell sequencing (PAAD_CRA001160) dataset were downloaded. We used R/Bioconductor edgeR for differential expression analysis. ClusterProfiler was utilized to perform GO enrichment analysis on differentially expressed genes. The online software CIBERSORT was used to reanalyze the mRNA expression data of pancreatic cancer. CellRanger, RunPCA, FindNeighbors, FindClusters, RunTSNE and RunUMAP were used to perform preprocessing, cell clustering and expression profile analysis on single-cell sequencing data sets. We analyzed intracellular pH with or without CA9 inhibitor SLC-0111. Indirect co-culture model of human pancreatic cancer cell lines and healthy individual-derived PBMCs were used to determine the effect of CA9-related Acidic Microenvironment on CD8+ T cells.ResultsThe CIBERSORT analysis of TCGA pancreatic cancer transcriptome sequencing data showed that among the 22 immune microenvironment components, CD8+ T cell infiltration was significantly correlated with the prognosis of pancreatic cancer patients. The differential expression analysis of the TCGA data grouped by the level of CD8+ T cell infiltration indicates that the expression of carbonic anhydrase 9 (CA9) is the most significant, and the survival analysis suggests that CA9 is associated with the overall survival of pancreatic cancer. TCGA data and GEO data set GSE131050 expression correlation analysis suggests that CA9 and CD8 expression are closely related. Pancreatic cancer single-cell sequencing data set PAAD_CRA001160 analysis results show that CA9 is mainly expressed in pancreatic cancer cell clusters, and the expression of the cancer cell subgroup CA9 in the single-cell data set is correlated with CD8+ T cell infiltration.ConclusionPancreatic cancer cells may inhibit the infiltration of CD8+ T cells through CA9. Further exploration of its related mechanisms can be used to explore the immune escape pathway of pancreatic cancer and provides new perspectives immune targeted therapy.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 57
  • 10.3389/fonc.2019.00434
Integrating Clinical and Genetic Analysis of Perineural Invasion in Head and Neck Squamous Cell Carcinoma.
  • May 31, 2019
  • Frontiers in Oncology
  • Ze Zhang + 8 more

Introduction: Perineural invasion (PNI), a key pathological feature of head and neck squamous cell carcinoma (HNSCC), predicts poor survival. However, the associated clinical characteristics remain uncertain, and the molecular mechanisms are largely unknown.Materials and methods: HNSCC gene expression and corresponding clinical data were downloaded from The Cancer Genome Atlas (TCGA). Prognostic subgroup analysis was performed, and potential PNI risk factors were assessed with logistic regression. PNI-associated gene coexpression modules were identified with weighted gene coexpression network analysis (WGCNA), and key module gene functions and the roles of non-malignant cells in PNI were evaluated with a single-cell transcriptomic dataset (GSE103322).Results: PNI was significantly inversely associated with overall survival (HR, 2.08; 95% CI, 1.27 to 3.40; P = 0.004), especially in advanced patients (HR, 2.62; 95% CI, 1.48 to 4.64; P < 0.001). Age, gender, smoking history, and alcohol history were not risk factors. HPV-positive cases were less likely than HPV-negative cases to develop PNI (OR, 0.28; 95% CI, 0.09 to 0.76; P = 0.017). WGCNA identified a unique significantly PNI-associated coexpression module containing 357 genes, with 12 hub genes (TIMP2, MIR198, LAMA4, FAM198B, MIR4649, COL5A1, COL1A2, OLFML2B, MMP2, FBN1, ADAM12, and PDGFRB). Single-cell transcriptomic data analysis revealed that the genes in the PNI-associated module correlated with the signatures “EMT,” “metastasis,” and “invasion.” Among non-malignant cells, fibroblasts had relatively high expression of the key genes.Conclusion: At the molecular and omic levels, we verified that PNI in HNSCC is a process of invasion rather than simple diffusion. Fibroblasts probably play an important role in PNI.Novelty & Impact Statements The study is a thorough analysis of PNI in HNSCC from the clinical level to the molecular level and presents the first description of cancer-related PNI from the omics perspective to date as far as we know. We verified that PNI in HNSCC is a process of invasion rather than simple diffusion, at the molecular and omic levels. Fibroblasts were found to probably play an important role in PNI by analyzing single-cell transcriptomic data.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 78
  • 10.1038/s41467-020-17900-3
A clustering-independent method for finding differentially expressed genes in single-cell transcriptome data
  • Aug 28, 2020
  • Nature Communications
  • Alexis Vandenbon + 1 more

A common analysis of single-cell sequencing data includes clustering of cells and identifying differentially expressed genes (DEGs). How cell clusters are defined has important consequences for downstream analyses and the interpretation of results, but is often not straightforward. To address this difficulty, we present singleCellHaystack, a method that enables the prediction of DEGs without relying on explicit clustering of cells. Our method uses Kullback–Leibler divergence to find genes that are expressed in subsets of cells that are non-randomly positioned in a multidimensional space. Comparisons with existing DEG prediction approaches on artificial datasets show that singleCellHaystack has higher accuracy. We illustrate the usage of singleCellHaystack through applications on 136 real transcriptome datasets and a spatial transcriptomics dataset. We demonstrate that our method is a fast and accurate approach for DEG prediction in single-cell data. singleCellHaystack is implemented as an R package and is available from CRAN and GitHub.

  • Research Article
  • Cite Count Icon 9
  • 10.1093/bib/bbad391
ScMHNN: a novel hypergraph neural network for integrative analysis of single-cell epigenomic, transcriptomic and proteomic data
  • Sep 22, 2023
  • Briefings in Bioinformatics
  • Wei Li + 6 more

Technological advances have now made it possible to simultaneously profile the changes of epigenomic, transcriptomic and proteomic at the single cell level, allowing a more unified view of cellular phenotypes and heterogeneities. However, current computational tools for single-cell multi-omics data integration are mainly tailored for bi-modality data, so new tools are urgently needed to integrate tri-modality data with complex associations. To this end, we develop scMHNN to integrate single-cell multi-omics data based on hypergraph neural network. After modeling the complex data associations among various modalities, scMHNN performs message passing process on the multi-omics hypergraph, which can capture the high-order data relationships and integrate the multiple heterogeneous features. Followingly, scMHNN learns discriminative cell representation via a dual-contrastive loss in self-supervised manner. Based on the pretrained hypergraph encoder, we further introduce the pre-training and fine-tuning paradigm, which allows more accurate cell-type annotation with only a small number of labeled cells as reference. Benchmarking results on real and simulated single-cell tri-modality datasets indicate that scMHNN outperforms other competing methods on both cell clustering and cell-type annotation tasks. In addition, we also demonstrate scMHNN facilitates various downstream tasks, such as cell marker detection and enrichment analysis.

  • Research Article
  • 10.3390/ijms26115297
ScDown: A Pipeline for Single-Cell RNA-Seq Downstream Analysis.
  • May 30, 2025
  • International journal of molecular sciences
  • Liang Sun + 7 more

Single-cell transcriptomics data are analyzed using two popular tools, Seurat and Scanpy. Multiple separate tools are used downstream of Seurat and Scanpy cell annotation to study cell differentiation and communication, including cell proportion difference analysis between conditions, pseudotime and trajectory analyses to study cell transition, and cell-cell communication analysis. To automate the integrative cell differentiation and communication analyses of single-cell RNA-seq data, we developed a single-cell RNA-seq downstream analysis pipeline called "scDown". This R package includes cell proportion difference analysis, cell-cell communication analysis, pseudotime analysis, and RNA velocity analysis. Both Seurat and Scanpy annotated single-cell RNA-seq data are accepted in this pipeline. We applied scDown to a published dataset and identified a unique, previously undiscovered signature of neuronal inflammatory signaling associated with a rare genetic neurodevelopmental disorder. These findings were not identified with a simple implementation of Seurat differential gene expression analysis, illustrating the value of our pipeline in biological discovery. scDown can be broadly utilized in downstream analyses of scRNA-seq data, particularly in rare diseases.

  • Research Article
  • 10.3389/fimmu.2025.1638156
Portfolio analysis of single-cell RNA-sequencing and transcriptomic data unravels immune cells and telomere-related biomarkers in sepsis
  • Oct 30, 2025
  • Frontiers in Immunology
  • Dan Chen + 4 more

BackgroundEarly diagnosis of sepsis is essential to reducing mortality. Immune cells and telomeres play important roles in sepsis, but their mechanisms were still unclear. This study aimed to explore the value of immune cells and telomere-related genes in sepsis.MethodsIn this study, the transcriptomic data with sepsis and control samples were obtained from public database. Multiple methods including differential expression analysis, immune infiltration analysis, weighted gene co-expression network analysis (WGCNA), 101-machine learning algorithm combinations were used to identify biomarkers which related to the immune cells and telomere. Afterwards, a nomogram was constructed to assess the clinical predictive value of biomarkers. In addition, gene set enrichment analysis (GSEA), regulatory network construction and drug prediction analysis were adopted to demonstrate the role of biomarkers in sepsis. The key cells were also identified using a single-cell dataset. Finally, the expression of biomarkers was further validated in clinical samples by reverse transcription quantitative polymerase chain reaction (RT-qPCR).ResultsThis study obtained a total of 4 biomarkers (MYO10, SULT1B1, MKI67, and CREB5), and the analysis of nomogram showed that the biomarkers had good clinical predictive value to sepsis. The enrichment analysis results revealed that the four biomarkers were enriched in the ribosome pathway. Besides, a lncRNAs-miRNAs-biomarkers network was constructed for the four biomarkers. Finally, we obtained a candidate drug (MS-275) and a key cell (CD16+ and CD14+ monocytes) respectively based on drug prediction and cell identification analysis. In addition, we found that the expression levels of CREB5 and SULT1B1 had significant changes during the process of key cell differentiation. The RT-qPCR results showed biomarkers were upregulated in the sepsis group, consistent with the bioinformatics analysis results.ConclusionThis study identified 4 biomarkers, namely MYO10, SULT1B1, MKI67, and CREB5 and explored the pathogenesis of sepsis, providing new insights for potential treatment strategies by integrating transcriptomic data and single-cell analysis.

  • Research Article
  • 10.1158/1538-7445.am2021-lb019
Abstract LB019: Trisicell: Scalable Tumor Phylogeny Reconstruction and Validation Reveals Developmental Origin and Therapeutic Impact of Intratumoral Heterogeneity
  • Jul 1, 2021
  • Cancer Research
  • Farid Rashidi Mehrabadi + 15 more

Emerging sets of single-cell sequencing data makes it appealing to apply existing tumor phylogeny reconstruction methods to analyze associated intratumor heterogeneity. Unfortunately, tumor phylogeny inference is an NP-hard problem and existing principled methods typically fail to scale up to handle thousands of cells and mutations observed in emerging single-cell data sets. Even though there are greedy heuristics to build hierarchical clustering of cells and mutations, they suffer from well-documented issues in accuracy. Additionally even when “optimal” solutions are feasible, existing approaches only provide a single “most likely” tree to depict the evolutionary processes that may result in an observed collection of cells and mutations. To make matters worse, the vast majority of single-cell sequencing data sets are transcriptomic and as a result, suffer from considerable variation in coverage across mutational loci. In this paper, we introduce Trisicell, a computational toolkit for scalable tumor phylogeny reconstruction and validation from single-cell genomic, exomic or transcriptomic sequencing data. Trisicell has three components: (i) Trisicell-DnC, a new tumor phylogeny reconstruction method from genotype matrices derived from single-cell data, (ii) Trisicell-ConT a new algorithm for constructing the consensus for two or more tumor phylogenies - which may be built through the use of different data types on the same set of cells, or built through the use of different methods on the same data, and (iii) Trisicell-PF, a new partition function method for assessing the likelihood of any user-defined subtree/set of cells to be seeded by a given set of mutations in the phylogeny. Collectively, these tools provide means of identifying and validating robust portions of a tumor phylogeny, offering the ability to focus on the most important (sub)clones and the genomic alterations that seed the associated clonal expansion. We applied Trisicell to a panel of clonal sublines derived from single-cells of a parental mouse melanoma model on which we performed both whole exome and whole transcriptome sequencing. The tumor phylogenies of the clonal sublines built on exomic and transcriptomic mutations by Trisicell-DnC, were shown by Trisicell-ConT to be highly similar and the subtrees comprised of phenotypically similar clonal sublines were shown to be strongly associated by Trisicell-PF to their seeding mutations. In addition, we applied Trisicell to single-cell whole transcriptome sequencing data from a tumor derived from the same parental melanoma cell line, which was subjected to anti-CTLA-4 immunotherapy. The phylogenies generated from both studies featured distinct subtrees, strongly associated with phenotypes including cell differentiation status, tumor growth and therapeutic response. These results suggest that Trisicell can be used for scalable tumor phylogeny reconstruction and validation through both single-cell and clonal-subline sequencing data, which may reveal strong phenotypic associations. In particular, they suggest that the developmental status and phenotypic intratumoral heterogeneity of melanoma originates from observable subclonal variation. Citation Format: Farid Rashidi Mehrabadi, Salem Malikić, Kerrie L. Marie, Eva Pérez-Guijarro, Erfan Sadeqi Azer, Howard H. Yang, Can Kızılkale, Charli Gruen, Huaitian Liu, Christina Marcelus, Aydın Buluç, Funda Ergün, Maxwell P. Lee, Glenn Merlino, Chi-Ping Day, S. Cenk Sahinalp. Trisicell: Scalable Tumor Phylogeny Reconstruction and Validation Reveals Developmental Origin and Therapeutic Impact of Intratumoral Heterogeneity [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr LB019.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 8
  • 10.3389/fimmu.2023.1307588
Integrating single-cell and bulk transcriptomic analyses to develop a cancer-associated fibroblast-derived biomarker for predicting prognosis and therapeutic response in breast cancer.
  • Jan 3, 2024
  • Frontiers in Immunology
  • Chunzhen Li + 8 more

Cancer-associated fibroblasts (CAFs) contribute to the progression and treatment of breast cancer (BRCA); however, risk signatures and molecular targets based on CAFs are limited. This study aims to identify novel CAF-related biomarkers to develop a risk signature for predicting the prognosis and therapeutic response of patients with BRCA. CAF-related genes (CAFRGs) and a risk signature based on these genes were comprehensively analyzed using publicly available bulk and single-cell transcriptomic datasets. Modular genes identified from bulk sequencing data were intersected with CAF marker genes identified from single-cell analysis to obtain reliable CAFRGs. Signature CAFRGs were screened via Cox regression and least absolute shrinkage and selection operator (LASSO) analyses. Multiple patient cohorts were used to validate the prognosis and therapeutic responsiveness of high-risk patients stratified based on the CAFRG-based signature. In addition, the relationship between the CAFRG-based signature and clinicopathological factors, tumor immune landscape, functional pathways, chemotherapy sensitivity and immunotherapy sensitivity was examined. External datasets were used and sample experiments were performed to examine the expression pattern of MFAP4, a key CAFRG, in BRCA. Integrated analyses of single-cell and bulk transcriptomic data as well as prognostic screening revealed a total of 43 prognostic CAFRGs; of which, 14 genes (TLN2, SGCE, SDC1, SAV1, RUNX1, PDLIM4, OSMR, NT5E, MFAP4, IGFBP6, CTSO, COL12A1, CCDC8 and C1S) were identified as signature CAFRGs. The CAFRG-based risk signature exhibited favorable efficiency and accuracy in predicting survival outcomes and clinicopathological progression in multiple BRCA cohorts. Functional enrichment analysis suggested the involvement of the immune system, and the immune infiltration landscape significantly differed between the risk groups. Patients with high CAF-related risk scores (CAFRSs) exhibited tumor immunosuppression, enhanced cancer hallmarks and hyposensitivity to chemotherapy and immunotherapy. Five compounds were identified as promising therapeutic agents for high-CAFRS BRCA. External datasets and sample experiments validated the downregulation of MFAP4 and its strong correlation with CAFs in BRCA. A novel CAF-derived gene signature with favorable predictive performance was developed in this study. This signature may be used to assess prognosis and guide individualized treatment for patients with BRCA.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon