Pregnancy-specific glycoproteins as potential drug targets for female lung adenocarcinoma patients.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Recently, the mRNA presence of pregnancy-specific glycoproteins (PSGs) in cancer biopsies has been shown to be associated with poor survival. Given the pregnancy-related function of PSGs, we hypothesized that PSGs might act in a sex-dependent behavior in cancer patients. A differential sex effect of PSG genes with respect to tumor immune landscape and cancer outcomes was investigated using statistical, bioinformatic, and machine learning analyses in The Cancer Genome Atlas (TCGA) data. The resulting findings were then validated in the Clinical Proteomic Tumor Analysis Consortium (CPTAC) data. In a pan-cancer TCGA data analysis, the strongest PSG-related sex difference for the prognostic association was found in lung adenocarcinoma (LUAD). Kaplan-Meier analysis revealed that expression of PSG genes is strongly associated with overall survival rate in the female group on the TCGA, but not in the male group. This sex-specific association was validated in an independent dataset from the CPTAC study. A combination of PSG3, PSG7, and PSG8 expression was most significantly linked to poor prognosis in females (P = 8.67E-06 in TCGA and P = .0382 in CPTAC). Pathway analysis revealed enrichment of the 'KRAS Signaling Down' pathway in the high-risk female group. A predictive model showed good predictive performance for the female group (validated C-index = 0.78 in CPTAC), but poor predictive performance for the male group. These findings suggest that PSGs may have a sex-specific negative impact on survival in female LUAD patients, and the mechanism may be related to KRAS signaling pathway modulation.

Similar Papers
  • Research Article
  • 10.1158/1538-7445.am2025-2398
Abstract 2398: Pregnancy-specific glycoproteins in tumors are strong predictors of outcome in female lung adenocarcinoma patients
  • Apr 21, 2025
  • Cancer Research
  • Jung Hun Oh + 6 more

Pregnancy-specific glycoproteins (PSGs) are normally critical regulators of maternal immune tolerance during pregnancy, preventing fetal rejection by modulating innate and adaptive immune responses. Recent studies have identified that mRNA expression of PSG genes in tumors is associated with poor prognosis in lung, mesothelioma, breast, and ovarian cancers. Given their pregnancy-related function, we hypothesized that PSGs could exert sex-specific effects in some cancers. An analysis of sex-differential effects of PSGs as prognostic markers was conducted using two datasets from The Cancer Genome Atlas (TCGA) including 235 male and 271 female patients and Clinical Proteomic Tumor Analysis Consortium (CPTAC) including 70 male and 36 female patients. To assess the predictive power of PSGs, we evaluated whether incorporating PSG expression levels improved machine learning models for survival prediction. Lung adenocarcinoma (LUAD) was identified as showing a significant sex-specific prognostic difference. Female LUAD patients expressing PSG genes had notably worse survival outcomes compared to those without PSG expression. Male patients did not show similar effects. Combinatorial analysis showed that a combination of PSG3, PSG7, and PSG8 expression resulted in particularly poor prognosis in the female group (30% of the female patients) with p = 8.67E-06. This finding was independently validated in the smaller CPTAC dataset with p = 0.0382. Further analysis of correlated pathway changes revealed that female LUAD patients with PSG expression exhibited enrichment of the "KRAS Signaling Down" pathway, suggesting a potential mechanism underlying the observed sex-specific effects. Including the KRAS pathway activity into the PSG model demonstrated robust performance in predicting survival outcomes for female patients (validated C-index = 0.78 using the CPTAC dataset) but did not improve prediction for male patients. These findings highlight a unique role for PSGs in influencing LUAD progression and survival in a sex-dependent manner. Targeting PSG-related pathways may offer a tailored treatment strategy for female LUAD patients, potentially improving outcomes. Further research is needed to unravel the relationship between PSG expression and KRAS pathway activation. We also plan to examine the sex-specific prognostic difference, looking into pregnancy history and the influence of hormone-related genes. Since PSGs are not typically expressed in the absence of pregnancy, they could be a promising drug target. Citation Format: Jung Hun Oh, 1 Gabrielle Rizzuto, 1 Rena Elkin, 1 Corey Weistuch, 1 Larry Norton, 1 Gabriela Dveksler, 2 Joseph O. Deasy1. Pregnancy-specific glycoproteins in tumors are strong predictors of outcome in female lung adenocarcinoma patients [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 2398.

  • Research Article
  • 10.1158/1538-7445.am2016-3890
Abstract 3890: NCI's Clinical Proteomic Tumor Analysis Consortium
  • Jul 15, 2016
  • Cancer Research
  • Mehdi Mesri + 5 more

The NCI Clinical Proteomic Tumor Analysis Consortium (CPTAC) constitutes a network of Proteome Characterization Centers (PCCs), which coordinate and conduct research and data sharing activities to comprehensively interrogate genomically characterized cancer biospecimens. Such integrated proteogenomics approach brings key insights into genomic abnormalities manifested at the protein network and pathway level. Established in late 2011, CPTAC works closely with genomics initiatives such as The Cancer Genome Atlas (TCGA) to systematically identify proteins that derive from alterations in cancer genomes and related biological processes, and provide this data with accompanying assays and protocols to the public. CPTAC investigators conducted the first in-depth comprehensive proteome (global, phosphoproteome and N-glycoproteome) study on understanding the impact of cold ischemia that has resulted in the development of an ischemic signature database and best practice procedures for biospecimen (tissue) sample collection/processing. The PCCs have deeply characterized three cancer types (colorectal, ovarian and breast tumors) previously analyzed by TCGA and additional tissue collections, all of which are accompanied by genomic datasets. Scientific findings examining 95 TCGA colon and rectal tumors using deep proteomic analysis include: a) five proteomic subtypes were identified, with the TCGA MSI/CIMP transcriptomic subtype separated into distinct proteomic subtypes including one subtype (subtype C) that displayed protein network features characteristic of epithelial-mesenchymal transition (EMT); b) copy number alterations (CNAs) on mRNA abundance analysis showed strong cis- and trans-effects; but few extending to the protein level; and c) chromosome 20q amplicon was shown to contain the largest global changes in both mRNA and protein levels, allowing for prioritizing driver genes. In addition to the publicly available proteomic datasets (https://cptac-data-portal.georgetown.edu/cptacPublic), proteomic targeted assays developed by the consortium are available at CPTAC's Assay Portal (http://assays.cancer.gov). This is a public resource of mass spectrometry-based quantitative targeted proteomic assays for measuring proteins of interest from the program. Lastly, monoclonal antibodies targeting cancer specific proteins and peptides are made available at CPTAC's Antibody Portal (http://antibodies.cancer.gov). Citation Format: Mehdi Mesri, Emily Boja, Tara Hiltke, Chris Kinsinger, Jerry Lee, Henry Rodriguez. NCI's Clinical Proteomic Tumor Analysis Consortium. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 3890.

  • Research Article
  • 10.1158/1538-7445.am2012-1282
Abstract 1282: NCI's Clinical Proteomic Tumor Analysis Consortium
  • Apr 15, 2012
  • Cancer Research
  • Robert C Rivers + 5 more

Mission: The Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of robust, quantitative, proteomic technologies and workflows. Goals: The overarching goal of CPTAC is to improve our ability to diagnose, treat and prevent cancer. To achieve this goal in a scientifically rigorous manner, the National Cancer Institute (NCI) launched CPTAC to systematically identify proteins that derive from alterations in cancer genomes and related biological processes, and provide this data with accompanying assays and protocols to the public. Genomics initiatives such as The Cancer Genome Atlas (TCGA) have characterized and sequenced the genomic alterations from several types of cancer. CPTAC will leverage its analytical outputs by producing a unique continuum that defines the proteins translated from cancer genomes in order to link genotype to proteotype and ultimately to phenotype. Scientific Approach: The structure of CPTAC is designed to facilitate the efficient implementation of a two-step developmental pipeline, Discovery and Verification. The pipeline originates with the CPTAC Resource Center, which will provide a set of high-quality, genomically characterized clinical biospecimens from patients with a single type of cancer. The Discovery Unit of each Proteome Characterization Center (PCC) will use a set of proteomic technologies to comprehensively analyze the protein composition of this set of biospecimens including mapping proteome to genome and detecting and quantifying protein products that correspond to genomic aberrations. All data from the Discovery Units will be shared within the network via the CPTAC Data Center. Based on this aggregate data set, each Discovery Unit will select and prioritize cancer-related proteins for potential use in the verification step. The selected lists of prioritized proteins by the Discovery Units will be consolidated and collectively prioritized. The Verification Unit of each PCC will develop targeted assays against corresponding protein targets as assigned by the verification plan. The Verification Units will then verify the biomarker candidates by performing these assays in clinically relevant cancer biospecimens (provided by the CPTAC Resource Center). Data from these verification studies will be shared via the CPTAC Data Center. These verified proteins will serve as high-value targets to other initiatives involved in clinical qualification studies. Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 103rd Annual Meeting of the American Association for Cancer Research; 2012 Mar 31-Apr 4; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2012;72(8 Suppl):Abstract nr 1282. doi:1538-7445.AM2012-1282

  • Abstract
  • 10.1182/blood-2024-202013
Inflammatory Reprogramming of the Tumor Microenvironment By Clonal Hematopoiesis Is Associated with Clinical Outcomes in Solid Cancer
  • Nov 5, 2024
  • Blood
  • Marco M Buttigieg + 3 more

Inflammatory Reprogramming of the Tumor Microenvironment By Clonal Hematopoiesis Is Associated with Clinical Outcomes in Solid Cancer

  • Research Article
  • Cite Count Icon 28
  • 10.1007/s00432-020-03147-4
Elevated TOP2A and UBE2C expressions correlate with poor prognosis in patients with surgically resected lung adenocarcinoma: a study based on immunohistochemical analysis and bioinformatics.
  • Feb 26, 2020
  • Journal of cancer research and clinical oncology
  • Wei Guo + 13 more

Lung cancer has the highest morbidity and mortality among all cancer types. Reliable prognostic biomarkers are needed to identify high-risk patients apart from TNM system for precision medicine. The present study is designed to identify robust prognostic biomarkers in lung adenocarcinoma (LUAD) based on integration of multiple GEO datasets, The Cancer Genome Atlas (TCGA) database and Clinical Proteomic Tumor Analysis Consortium (CPTAC) database. Four LUAD GEO datasets (GSE10072, GSE2514, GSE43458, and GSE32863) and TCGA database were implemented to analyze the differently expressed genes (DEGs). Gene ontology, KEGG pathway, and protein-protein interaction network (PPI) were conducted based on the above DEGs. Hub genes were selected based on connectivity degree in the PPI network. Expression analysis and Kaplan-Meier survival analysis were conducted in CPTAC lung adenocarcinomas cohort. Kaplan-Meier survival analysis and Cox proportional hazards regression were performed on these hub genes using TCGA and our own cohort. A total of 430 shared genes in all five datasets were identified as DEGs. Based on their PPI network, nine hub genes were selected and all of them were significantly associated with overall survival using GEPIA analysis. Two hub genes, TOP2A and UBE2C, were further combined and showed poorer prognosis in both TCGA dataset and our validated cohort. Analysis in CPTAC revealed that TOP2A and UBE2C were significantly highly expressed in tumor sample. Multivariable analysis suggested TOP2A and UBE2C as independent prognostic factors in LUAD. Using data mining approach, we identified TOP2A and UBE2C as two robust prognostic factors in LUAD. We also demonstrated the TOP2A/UBE2C co-expression status in LUAD, and TOP2A/UBE2C co-expression correlated with poorer prognosis. More in-depth research is needed for transforming this result into clinical setting.

  • Research Article
  • Cite Count Icon 139
  • 10.1074/mcp.tir119.001673
Integration and Analysis of CPTAC Proteomics Data in the Context of Cancer Genomics in the cBioPortal
  • Sep 1, 2019
  • Molecular & Cellular Proteomics
  • Pamela Wu + 8 more

The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has produced extensive mass spectrometry-based proteomics data for selected breast, colon, and ovarian tumors from The Cancer Genome Atlas (TCGA). We have incorporated the CPTAC proteomics data into the cBioPortal to support easy exploration and integrative analysis of these proteomic datasets in the context of the clinical and genomics data from the same tumors. cBioPortal is an open source platform for exploring, visualizing, and analyzing multidimensional cancer genomics and clinical data. The public instance of the cBioPortal (http://cbioportal.org/) hosts more than 200 cancer genomics studies, including all of the data from TCGA. Its biologist-friendly interface provides many rich analysis features, including a graphical summary of gene-level data across multiple platforms, correlation analysis between genes or other data types, survival analysis, and per-patient data visualization. Here, we present the integration of the CPTAC mass spectrometry-based proteomics data into the cBioPortal, consisting of 77 breast, 95 colorectal, and 174 ovarian tumors that already have been profiled by TCGA for mutations, copy number alterations, gene expression, and DNA methylation. As a result, the CPTAC data can now be easily explored and analyzed in the cBioPortal in the context of clinical and genomics data. By integrating CPTAC data into cBioPortal, limitations of TCGA proteomics array data can be overcome while also providing a user-friendly web interface, a web API, and an R client to query the mass spectrometry data together with genomic, epigenomic, and clinical data.

  • Research Article
  • 10.1007/s00210-025-04668-w
Comprehensive big data analysis reveals SLC10A3 as a potential biomarker in head and neck cancer.
  • Nov 17, 2025
  • Naunyn-Schmiedeberg's archives of pharmacology
  • Bintee Bintee + 9 more

Head and neck cancer (HNC) continues to pose a significant global health challenge, highlighting the urgent need for discovering new therapeutic targets. Recent studies highlight the role of solute carrier (SLC) proteins in cancer progression. This study investigates the expression and potential role of SLC10A3 in HNC, aiming to determine its clinical significance and therapeutic relevance. Publicly available datasets, including The Cancer Genome Atlas (TCGA), Clinical Proteomics Tumor Analysis Consortium (CPTAC), and Gene Expression Omnibus (GEO), were analyzed to assess SLC10A3 expression in head and neck squamous cell carcinoma (HNSCC). The prognostic relevance of SLC10A3 was assessed using Kaplan-Meier (KM) survival and Receiver Operating Characteristic (ROC) curve analysis. Correlation analysis within TCGA, CPTAC, and GEO datasets identified genes associated with SLC10A3 expression. Protein-protein docking studies were performed to predict potential interactions between SLC10A3 and identified protein coding genes. SLC10A3 was found to be significantly upregulated in HNSCC tumor samples compared to normal tissues across TCGA and CPTAC datasets. Increased SLC10A3 expression correlated with poor survival outcomes in TCGA-HNSCC patients. Correlation analysis identified 26 genes positively associated with SLC10A3, where BCAP31, IRAK1, and UBL4A showed consistent correlation across TCGA, CPTAC, and GEO datasets. Computational protein interaction modeling using docking and AI/ML-based Evolutionary Scale Modelling (ESM) framework revealed significant binding affinities between SLC10A3 and identified protein-coding genes, suggesting potential functional interactions. These findings establish SLC10A3 as a promising therapeutic target in HNC. Its consistent upregulation, association with poor prognosis, and potential interactions with key regulatory proteins highlight its relevance for future therapeutic strategies.

  • Research Article
  • Cite Count Icon 4
  • 10.1158/1538-7445.am2019-5112
Abstract 5112: LinkedOmics: Analyzing multi-omics data within and across 32 cancer types
  • Jul 1, 2019
  • Cancer Research
  • Suhas Vasaikar + 3 more

Background: The Cancer Genome Atlas (TCGA) project has performed molecular profiling of human tumors using genomic, epigenomic, transcriptomic, and proteomic platforms, and each tumor is comprehensively characterized by around 100,000 molecular attributes in addition to typical clinical attributes. To make these data directly available to the entire cancer research community, several data portals have been developed. However, none of the existing data portals allow systematic exploration and interpretation of the complex relationships between the vast amount of clinical and molecular attributes. Methods: We developed LinkedOmics (http://www.linkedomics.org), a web platform that focuses on the discovery and interpretation of associations between clinical and molecular attributes. LinkedOmics includes three data analysis modules. The LinkFinder module allows flexible exploration of associations between a molecular or clinical attribute of interest and all other attributes, providing the opportunity to analyze and visualize associations between billions of attribute pairs for each cancer cohort. The LinkCompare module enables easy comparison of the associations identified by LinkFinder, which is particularly useful in multi-omics and pan-cancer analyses. The LinkInterpreter module transforms identified associations into biological understanding through pathway and network analysis. All modules provide user-friendly data visualization. Results: The current version of LinkedOmics contains multi-omics data and clinical data for 32 cancer types and a total of 11,158 patients from the TCGA project. Currently we have ~50 PDX (patient derived xenograft) samples data. It is also the first multi-omics database that integrates mass spectrometry (MS)-based global proteomics data generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) on selected TCGA tumor samples, as well as CPTAC Colon, Endometrial, Kidney and Breast cancer. In total, LinkedOmics has more than a billion data points. We used several case studies to demonstrate the utility of LinkedOmics in revealing functional impact of somatic mutation or copy number alteration on mRNA or protein expression, in deriving multi-omics based protein signature for poor prognosis, in performing pan-cancer analysis to identify survival-associated gene expression signature, and in connecting novel pan-cancer poor prognosis markers to tumor invasiveness and aggressiveness. Conclusions: LinkedOmics provides a unique platform for biologists and clinicians to access, analyze and compare cancer multi-omics data within and across tumor types. With 5 case studies we demonstrated the power of LinkedOmics in cancer research. Although the current version of LinkedOmics includes only TCGA and CPTAC data, it can be easily extended to support other cohort-based or user provided multi-omics studies. Citation Format: Suhas Vasaikar, Peter Straub, Jing Wang, Bing Zhang. LinkedOmics: Analyzing multi-omics data within and across 32 cancer types [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 5112.

  • Research Article
  • 10.1158/1538-7445.am2018-4359
Abstract 4359: Several cancer types are associated with increased expression and activity of the ECT2 RhoGEF and decreased expression and activity of the DLC1 RhoGAP, leading to increased RhoA activity
  • Jul 1, 2018
  • Cancer Research
  • Dunrui Wang + 4 more

RhoA, which is a member of the Ras family of small GTPases, whose proteins cycle between an active GTP-bound form and an inactive GDP-bound form, frequently has high activity in advanced cancer. The RhoA protein is activated by RhoGEFs, which catalyze replacement of bound-GDP with bound-GTP, and inactivated by RhoGAPs, which hydrolyze bound-GTP to bound-GDP. In this study, we have used the The Cancer Genome Atlas (TCGA) and the NCI Clinical Proteomic Tumor Analysis Consortium (CPTAC) databases to examine the relationship between a specific RhoGEF, ECT2, and a specific RhoGAP, DLC1, and experimentally tested the biological relevance of some key findings. Using the TCGA dabase, we have found increased expression of the ECT2 RhoGEF frequently occurs in synchrony with decreased expression of the DLC1 RhoGAP in several tumor types, including lung adenocarcinoma, lung squamous cell carcinoma, and hepatocellular cancer. In lung adenocarcinoma, the combination of high ECT2 expression and low DLC1 expression level is more common in poorly differentiated tumors than in well differentiated ones. Using other publicly available datasets from NCBI Gene Expression Omnibus (GEO), the expression DLC1 and ECT2 was found to be cell cycle related, with higher DLC1 being observed exclusively in the G0 phase, while ECT2 was higher during the cell cycle, relative to G0, and peaked in the G2/M phase. In CPTAC, phosphorylation of a specific ECT2 amino acid (S861) is correlated with a poor prognosis in breast cancer. Consistent with this observation, experimental mutation of ECT2 S861 to A861 results in an ECT2 gene with loss-of-function. Point mutation of ECT2 and DLC1 were found at overall frequencies of 1.7% and 4.7%, respectively, in the tumors of the TCGA database. These cancer-associated mutations occur along the entire length of the ECT2 and DLC1 proteins, rather than being located primarily in the Rho-GEF domain of ECT2 or the Rho-GAP domain of DLC1, and experimental analysis of a subset of these mutants indicates that most of the ECT2 mutants are gain-of-function, while most of the DLC1 mutants are loss-of-function. Our results imply that ECT2 and DLC1 frequently act in concert, but in opposite directions, in cancer to increase RhoA activity. * Authors contributed equally to this abstract Citation Format: Dunrui Wang, Xiaolan Qian, Marian E. Durkin, Alexander G. Papageorge, Douglas R. Lowy. Several cancer types are associated with increased expression and activity of the ECT2 RhoGEF and decreased expression and activity of the DLC1 RhoGAP, leading to increased RhoA activity [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 4359.

  • Research Article
  • Cite Count Icon 1
  • 10.1158/1557-3265.adi21-po-003
Abstract PO-003: Deep learning identifies conserved pan-cancer tumor features
  • Mar 1, 2021
  • Clinical Cancer Research
  • Javad Noorbakhsh + 8 more

Histopathological images are an integral data type for studying cancer. We show pre-trained convolutional neural networks (CNNs) can be systematically applied across cancer types, enabling comparisons to reveal shared spatial behaviors. We develop CNNs with a common architecture trained on 19 cancer types of The Cancer Genome Atlas (TCGA), analyzing 14459 hematoxylin and eosin scanned frozen tissue images. Our CNNs are based on the Inception-V3 network and classify TCGA pathologist-annotated tumor/normal status of whole slide images in all 19 cancer types with consistently high AUCs (0.995±0.008). Remarkably, CNNs trained on one tissue are effective in others (AUC 0.88±0.11), with classifier relationships recapitulating known adenocarcinoma, carcinoma, and developmental biology. Moreover, classifier comparisons reveal intra-slide spatial similarities, with an average tile-level correlation of 0.45±0.16 between classifier pairs on the TCGA test sets. In particular, the TCGA-trained classifiers had average tile-level correlation of 0.52±0.09 and 0.58±0.08 on hold-out TCGA lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) test sets, respectively. These relations are reflected on two external datasets, i.e., LUAD and LUSC whole slide images of Clinical Proteomic Tumor Analysis Consortium. The CNNs trained on TCGA achieved cross-classification AUCs of 0.75±0.12 and 0.73±0.13 on LUAD and LUSC external validation sets, respectively. These CNNs had average tile-level correlations of 0.38±0.09 and 0.39±0.08 on LUAD and LUSC validation sets, respectively. Breast cancers, bladder cancers, and uterine cancers have spatial patterns that are particularly easy to detect, suggesting these cancers can be canonical types for image analysis. This study illustrates pre-trained CNNs can detect tumor features across a wide range of cancers, suggesting presence of pan-cancer tumor features. These shared features allow combining datasets when analyzing small samples to narrow down the parameter search space of CNN models. Citation Format: Javad Noorbakhsh, Saman Farahmand, Ali Foroughi pour, Sandeep Namburi, Dennis Caruana, David Rimm, Mohammad Soltanieh-ha, Kourosh Zarringhalam, Jeffrey H. Chuang. Deep learning identifies conserved pan-cancer tumor features [abstract]. In: Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; 2021 Jan 13-14. Philadelphia (PA): AACR; Clin Cancer Res 2021;27(5_Suppl):Abstract nr PO-003.

  • Research Article
  • 10.1016/j.labinv.2025.104253
Toward the Best Generalizable Performance of Machine Learning in Modeling Omic and Clinical Data.
  • Dec 1, 2025
  • Laboratory investigation; a journal of technical methods and pathology
  • Fei Deng + 2 more

Toward the Best Generalizable Performance of Machine Learning in Modeling Omic and Clinical Data.

  • Research Article
  • 10.1158/1538-7445.am2023-5433
Abstract 5433: Representation learning for histological profiling of lung squamous premalignant lesions and tumors
  • Apr 4, 2023
  • Cancer Research
  • Rushin Gindra + 9 more

Lung squamous cell carcinoma (LSCC) is preceded by the development of bronchial premalignant lesions (PMLs). PMLs progress through a series of histologic grades characterized by molecular and morphologic alterations. To enhance our understanding of PML progression to lung cancer, we are developing deep learning methods to quantitate PML histologic features and localize PML severity along the histologic disease spectrum. We leveraged whole slide images (WSIs) of lung tumors (lung adenocarcinoma, LUAD and LSCC) and non-involved adjacent (referred to as ‘normal’) tissue from the Clinical Proteomic Tumor Analysis Consortium (CPTAC), the National Lung Screening Trial (NLST) and the Cancer Genome Atlas (TCGA). The NLST WSIs were used for training a self-supervised contrastive learning (simCLR) model. A graph isomorphism network (GIN) trained on CPTAC WSIs was developed using the features from simCLR and perform WSI-level predictions on tissue subtype (LUAD, LSCC, normal). Model performance was assessed using area under the receiver operating curves (AUC). The model was used to learn WSI-level features of endobronchial biopsies of PMLs from two datasets (PCGA, n=365 samples spanning all histologic grades, and CIS, n=112, only high-grade samples). The features from the final layer of the GIN were used to compute principal components (PCs) and visualize the relationships between samples using t-SNE and UMAP. To interpret how the GIN processes WSI data, we performed gradient-based class activation mapping (Grad-CAMs) on the graphs. High model performance was observed on the CPTAC test dataset (Accuracy = 87%, AUC >= 0.89) but dropped slightly on the TCGA dataset (Accuracy = 72%, AUC >= 0.75). The GIN-CAMs identified WSI regions that were highly associated with the output label based on expert pathologist annotation on TCGA cases. Clustering revealed distinct groupings of normal, LUAD and LSCC subtypes (PC1 & PC2, p<0.01). Most WSIs from the PCGA data clustered with normal tissues (PC1 & PC2: p<0.01 for PCGA v LUAD, PCGA v LSCC). However, a portion of the CIS samples were closely grouped with LSCC tumor cases (74% of CIS WSIs were classified as LSCC tumors. PC1 & PC2: p<0.01 for CIS v LUAD; PC1: p<0.01 for CIS v LSCC). Additionally, the feature distribution of CIS samples that progressed towards LUSC tumors were significantly different than those that regressed (PC1 & PC3: p < 0.05 for Progressive v Regressive). The GIN recognizes and generates robust features that are distinctly and consistently observed in the different histologic groups across multiple cohorts. These features are also able to stratify progressive and regressive high-grade lesions. The stratification of PMLs is an important step in designing novel interception strategies to prevent the development of lung cancer, and our results suggest that pathology data may be efficacious to include in future biomarkers. Citation Format: Rushin Gindra, Yi Zheng, Regan Conrad, Emily Green, Sarah Mazzilli, Ehab Billatos, Mary Reid, Eric Burks, Vijaya Kolachalama, Jennifer E. Beane. Representation learning for histological profiling of lung squamous premalignant lesions and tumors. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 5433.

  • Research Article
  • 10.1158/1538-7445.am2019-2129
Abstract 2129: CPTAC: Biospecimen accrual for proteogenomics
  • Jul 1, 2019
  • Cancer Research
  • Mathangi Thiagarajan

The Clinical Proteomics Tumor Analysis Consortium (CPTAC) is a National Cancer Institute initiative that seeks to uncover the molecular basis of cancer using a proteogenomic approach to study prospective cancer specimens from treatment naïve patients. Leidos Biomed provides an infrastructure for supporting the collection of high quality biospecimens specific to proteogenomics and the corresponding clinical data, in addition to project and subcontract management for the program. The proportion of biospecimens so far from racial and ethnic minority patients are underrepresented, which preclude equitable research across all patient groups for cancer treatment. CPTAC program seeks diversity in the accrual cohort and an effort is ongoing to encourage the participation of tissue source sites serving racial and ethnic minorities that can contribute high quality biospecimens towards the collection. CPTAC applies the understanding of the molecular basis of cancer to identify biomarker candidates. CPTAC Phase II completed in 2016, collecting over 350 cases from breast, colon and ovarian patients. CPTAC Phase III that is in progress began to collect and analyze 200 cases from each of ten additional cancers. Analysis is close to completion for uterine corpus endometrial carcinoma, kidney clear cell renal cell carcinoma and lung adenocarcinoma while accruals are ongoing for 7 other cancer types. The study entails collection and pathology evaluation of biospecimens, high-quality clinical data and images from clinical sites from around the world. A biorepository evaluates and processes the biospecimens, sending nucleic acids to a sequencing center and tissues to proteomics groups. Data are combined and analyzed by analysis and translational centers. Genomic data are made available to the research community through the NCI Genomic Data Commons(GDC). Proteomic data are made available through CPTAC’s Data Coordinating Center(DCC). Imaging data are made available through The Cancer Imaging Archive (TCIA). The program is expanding and initiating biospecimen collection for proteogenomic analysis of ten additional rare tumor types. Citation Format: Mathangi Thiagarajan. CPTAC: Biospecimen accrual for proteogenomics [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 2129.

  • Research Article
  • Cite Count Icon 1
  • 10.1158/1538-7445.am2018-2295
Abstract 2295: LinkedOmics: Analyzing multi-omics data within and across 32 cancer types
  • Jul 1, 2018
  • Cancer Research
  • Suhas Vasaikar + 3 more

Background: The Cancer Genome Atlas (TCGA) project has performed molecular profiling of human tumors using genomic, epigenomic, transcriptomic, and proteomic platforms, and each tumor is comprehensively characterized by around 100,000 molecular attributes in addition to typical clinical attributes. To make these data directly available to the entire cancer research community, several data portals have been developed. However, none of the existing data portals allow systematic exploration and interpretation of the complex relationships between the vast amount of clinical and molecular attributes. Methods: We developed LinkedOmics (http://www.linkedomics.org), a web platform that focuses on the discovery and interpretation of associations between clinical and molecular attributes. LinkedOmics includes three data analysis modules. The LinkFinder module allows flexible exploration of associations between a molecular or clinical attribute of interest and all other attributes, providing the opportunity to analyze and visualize associations between billions of attribute pairs for each cancer cohort. The LinkCompare module enables easy comparison of the associations identified by LinkFinder, which is particularly useful in multi-omics and pan-cancer analyses. The LinkInterpreter module transforms identified associations into biological understanding through pathway and network analysis. All modules provide user-friendly data visualization. Results: The current version of LinkedOmics contains multi-omics data and clinical data for 32 cancer types and a total of 11,158 patients from the TCGA project. It is also the first multi-omics database that integrates mass spectrometry (MS)-based global proteomics data generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) on selected TCGA tumor samples. In total, LinkedOmics has more than a billion data points. We used several case studies to demonstrate the utility of LinkedOmics in revealing functional impact of somatic mutation or copy number alteration on mRNA or protein expression, in deriving multi-omics based protein signature for poor prognosis, in performing pan-cancer analysis to identify survival-associated gene expression signature, and in connecting novel pan-cancer poor prognosis markers to tumor invasiveness and aggressiveness. Conclusions: LinkedOmics provides a unique platform for biologists and clinicians to access, analyze and compare cancer multi-omics data within and across tumor types. With 5 case studies we demonstrated the power of LinkedOmics in cancer research. Although the current version of LinkedOmics includes only TCGA and CPTAC data, it can be easily extended to support other cohort-based or user provided multi-omics studies. Citation Format: Suhas Vasaikar, Pater Straub, Jing Wang, Bing Zhang. LinkedOmics: Analyzing multi-omics data within and across 32 cancer types [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 2295.

  • Research Article
  • Cite Count Icon 13
  • 10.3892/ol.2020.12118
Mammaglobin B may be a prognostic biomarker of uterine corpus endometrial cancer.
  • Sep 18, 2020
  • Oncology letters
  • Jie Li + 2 more

Mammaglobin B, also referred to as secretoglobin family 2A member 1 (SCGB2A1), has been reported to be highly expressed in uterine corpus endometrial cancer (UCEC) compared with in the normal endometrium. However, the prognostic value of SCGB2A1 in UCEC remains unclear. The Oncomine, The Cancer Genome Atlas (TCGA) and Clinical Proteomic Tumor Analysis Consortium databases were used to explore the differential expression of SCGB2A1. Furthermore, data of patients with UCEC were downloaded from TCGA, and logistic regression analysis, survival analysis, univariate and multivariate analyses, and nomogram construction were performed to identify its prognostic value in UCEC. Additionally, gene set enrichment analysis (GSEA) was utilized to estimate the mechanisms of SCGB2A1 in UCEC. Finally, immune infiltration of SCGB2A1 in UCEC was analyzed using the Tumor Immune Estimation Resource. Decreased mRNA and protein expression levels of SCGB2A1 were significantly associated with poor prognostic clinicopathological characteristics (all P<0.05). Additionally, low expression levels of SCGB2A1 were associated with decreased survival of patients with UCEC compared with high expression levels of SCGB2A1. Furthermore, the independent prognostic value of SCGB2A1 in UCEC was identified by univariate and multivariate analyses. A nomogram based on 6 variables, including SCGB2A1 expression, was developed for the estimation of the 1-, 3-, and 5-year survival probability in UCEC. Additionally, GSEA suggested that the vascular endothelial growth factor, PTEN, platelet-derived growth factor, DNA repair, KRAS signaling, and PI3K-AKT-mTOR signaling pathways were differentially enriched in the low SCGB2A1 expression phenotype. Finally, high infiltration levels of CD8+ T cells were associated with SCGB2A1 in UCEC and this was associated with prognosis. The present results indicated that SCGB2A1 may be a promising independent prognostic factor in UCEC. These signaling pathways may be crucial for the regulation of UCEC via SCGB2A1.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.