Abstract Hematoxylin and eosin (H&E) stained histologic sections contain invaluable information that remains largely untapped because of its complexity. To this end, AI applications employing deep learning (DL) can facilitate the translation of image data to enable human interpretation and yield novel oncological insights that would have otherwise remained imperceptible. DL-based methodologies are multimodal, capable of integrating imaging with clinicogenomic data to furnish a more holistic perspective and affording more accurate predictions in oncology. Here, we developed an unsupervised DL workflow to analyze 1,799 H&E images of lung cancer (NSCLC n = 951; SCLC n = 50; others n = 798) incorporating comprehensive patient-level clinical data (electronic health records [ConcertAI]) integrated with genomics (WES and RNA-seq [Caris Labs]). There are three steps in our approach: (1) image preprocessing and filtering, yielding > 30 million image patches; (2) utilizing pretrained SimCLR models from 57 public oncology histopathology datasets with ResNet-18 as a backbone structure to extract 512-dimensional-feature vectors for each patch; (3) using three unsupervised clustering methods (kmeans, DBSCAN, Leiden clustering) to cluster patches and selected Leiden clustering. We identified 635 primary imaging clusters using an elbow method and generated an image feature matrix by calculating correlations between each patch and cluster centroids; these were aggregated and mapped back to source slides. In this proof-of-concept, distinct image feature patterns characterized SCLC and NSCLC samples. For SCLC, one of the salient features was the presence of hemorrhage, which may be associated with higher rates of fine-needle aspiration biopsy procedure for SCLC compared with NSCLC which was confirmed in the EHR data (p = 0.032). Derived morphological clusters were correlated with tumor-immune genomic features (Tumor Mutational Burden [TMB], Immunologic Constant of Rejection [ICR], and Miracle scores1) serving as predictors of response to immune-checkpoint inhibitor therapy. By applying linear models, we detected 11, 96 and 249 significantly associated imaging clusters, respectively, highly enriched with immune cells e.g., plasma cells, macrophages, lymphocytes, and supporting an infiltrated and inflamed tumor-immune microenvironment. In summary, a multimodal, unsupervised deep learning workflow combining H&E imaging with clinicogenomic data was developed to identify histologic feature clusters associated with well-established tumor-immune genomic signatures of NSCLC immune infiltration and molecular phenotypes. These studies demonstrate enormous potential to yield histopathological and translational insights in NSCLC and SCLC that can empower clinicians to make better therapeutic response predictions. Citation Format: Si Wu, Yujie Zhao, Hugo Luo, Kevin Kolahi, Thanh Bui, Xu Shi, Aditee Shrotre, Alexander Liede, Xi Zhao, Josue Samayoa, Weilong Zhao. Integrating real-world histopathological and clinicogenomic data from 1799 lung cancer patients by applying unsupervised deep learning [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 2310.
Read full abstract