Abstract Background: Understanding the composition of cancer-associated stroma (CAS) is vital, as the number and location of immune cells and fibroblasts, as well as the degree of extracellular matrix deposition, have implications for cancer progression and response to treatment, including in non-small cell lung cancer (NSCLC). Manual analysis of CAS does not fully describe the stromal milieu, especially from a spatial perspective, and is highly subjective. To this end, we have developed an unsupervised machine learning (ML) model to characterize the CAS in NSCLC from hematoxylin and eosin (H&E) stained whole slide images (WSI) at scale. Methods: PathExploreTM models were deployed to predict stromal tissue and cell types, while another ML model was used to detect collagen fibers from H&E stained WSIs from the TCGA LUAD (N=536) and LUSC (N=464) datasets. Stroma was divided into small regions (median = 0.02 mm2), and 88 features characterizing cell distribution, tissue composition and fiber density were extracted from each region. Graphs were generated connecting neighboring regions (nodes), and an unsupervised variational graph auto-encoder (VGAE) model was trained to learn 8 latent features through dimensionality reduction. Stromal phenotypes were then derived from the latent features using k-means clustering. The fraction of each phenotype in the stroma was correlated against immune- and stroma-related gene expression signatures (GES) and overall survival (OS). Results: Deployment of VGAE on LUAD and LUSC WSIs revealed three distinct stromal phenotypes - P0, P1 and P2. Fibroblast density was elevated in P0 and P1 regions (p<0.001), immune cell density was elevated in P2 regions (p<0.001), and collagen fiber intensity was highest in P1 regions (p<0.001). P2 enrichment was correlated with elevated expression of the T cell-inflamed gene expression profile (TGEP; Spearman ρ = 0.43 in LUAD; ρ = 0.27 in LUSC) and with improved OS (HR = 0.696; 95% CIs: 0.571-0.847 in LUSC). Conversely, P1 enrichment was positively associated with a transforming growth factor-β-induced cancer associated fibroblast GES (TGFβ-CAF: ρ = 0.19 in LUAD and ρ = 0.12 in LUSC) and poor OS (HR = 1.358; 95% CIs: 1.149-1.603 in LUSC). These phenotypes are consistent with fibroblast-enriched, collagen-depleted stroma (P0), collagen-rich, fibroblast-enriched tumor-promoting stroma (P1), and immune cell-enriched, tumor-suppressive stroma (P2). Conclusions: We describe an unsupervised, data-driven method of predicting stromal regions with discrete patterns of cell composition and collagen deposition in NSCLC. This approach identified three phenotypes of NSCLC stroma. These results highlight the ability of ML models to characterize and find meaningful patterns within the cell, tissue, and matrix components of a tumor. This work provides further evidence of the potential of ML to discover novel precision medicine biomarkers in NSCLC. Citation Format: Neel Patel, Nhat Le, Tan Nguyen, Fedaa Najdawi, Sandhya Srinivasan, Adam Stanford-Moore, Deeksha Kartik, Jun Zhang, Jacqueline Brosnan-Cashman, Robert Egger, Justin Lee, Matthew Bronnimann. Unsupervised detection of stromal phenotypes with distinct fibrogenic and inflamed properties in NSCLC [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 4912.
Read full abstract