Abstract

Abstract Background: Molecularly and clinically well-annotated patient datasets are ideal for studying tumor biology and developing robust machine learning (ML) models for predicting outcome and treatment response. These data however rarely exist in real-world settings or in sufficient quantities within research contexts. Large publicly available datasets like The Cancer Genome Atlas (TCGA) which provid multi-omic profiles for diverse cancer types, have profoundly advanced cancer research and facilitated development of novel therapies and personalized medicines. However, the absence of patient outcome data tied to treatment limits the applicability of these data for understanding and modeling treatment response. Real-world clinicogenomics cohorts, such as the AACR Project GENIE, on the other hand are typically very rich in clinical annotations, including treatment regimens and outcomes measures. These data are however sparsely annotated for patient tumor molecular profiles, rarely exceeding ~100’s of genes profiled. We hypothesized that it would be possible to reconstruct latent tumor mRNA representations from limited genomic and clinical data available in real-world clinicogenomic cohorts, and that these reconstructed expression profiles would be useful for a variety of clinically meaningful downstream applications. Methods: We developed an ML model, called Mut2Ex, to reconstruct tumor gene expression profiles using genetic information available on commercial next generation sequencing panels using a Principle Label Space Transformation (PLST) we adapted to regression problem, along with embeddings from clinical information (OncoTree code, sex and stage) generated by a language model. Mut2Ex was trained on ~1200 cell lines from DepMap representing 26 cancer types, to generate ~2000 reconstructed mRNA gene profiles that were applied to a variety of clinical tasks. We used Mut2Ex to reconstruct mRNA profiles for ~10,000 tumors from TCGA and ~180,000 tumors from AACR Project GENIE. Results: Reconstructed mRNA expression by Mut2Ex was highly correlated with true expression in cell lines (rho = 0.926, [0.924-0.928 95% CI, N = 1184]). Compared to true expression, reconstructed profiles recapitulate sub-clusters within cancer types, PAM50 subtyping in breast tumors, survival signatures in colorectal tumors and multiple oncogenic signatures in a pan-cancer manner. Analysis of reconstructed expression for AACR Project GENIE tumors revealed expected enrichment of known driver genes within expression subtypes and enrichment of oncogenic signatures associated with distinct clinical outcomes in a cancer type specific manner. Conclusions: Our flexible analytic framework for reconstructing gene expression profiles from clinicogenomics data substantially augments the clinical utility and value of data acquired in real-world settings. Citation Format: Maayan Baron, Sunil Kumar, Felicia Kuperwaser, Dillon Tracy, Emily Vucic, Jeff Sherman. Reconstructing a latent representation of gene expression from genomic alterations to improve clinical utility of real-world clinicogenomics data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 3519.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call