Abstract Purpose: Pathologists routinely analyze H&E-stained FFPE tissue slides microscopically to arrive at diagnoses for patients. Pathomics, the study of quantitative imaging from such samples, aims to elevate diagnostic accuracy by revealing intricate tissue and cellular details. Our research evaluates how pathomic features from tissue imagery can predict gene expression, as determined by RNA sequencing, offering advancements in the molecular profiling and targeted therapy for invasive breast carcinoma (IBC). Method: We conducted an analysis of 90 regions of interest (ROIs) on FFPE tissue slide images from the TCGA registry for patients with IBC. Using HistomicsTK software, we extracted nearly 300 pathomic features depicting cell morphometry, intensity, and gradient from these regions of interest (ROIs). We then assessed the intra-correlation of the pathomic features within the ROIs of the same tissue slide to identify the most heterogeneous and highly correlating features (n=20) across categories—morphometry, intensity, and gradient—with a Pearsonʼs Correlation coefficient (r) greater than 0.9 and a FDR adjusted p-value less than 0.05. Subsequently, gene expression data in the form of FPKM were obtained for the corresponding tissues from TCGA. We trained a Multitask Elastic Net (MTEN) model on these pathomic features using an 80:20 split for the training and testing sets, with a three-fold cross-validation applied to the training set utilizing ImaGene software. The model's performance was evaluated by measuring the AUC and R² value for the predicted gene expression on the testing set. Result and Biological significance: The testing of the MTEN model using a testing set predicted the expression of three genes identified in the literature as prognostic markers in breast cancer, namely ALDH1L2, MFAP5, and MXRA8, with an AUC greater than 0.8 and an R² above 0.5 at a p-value of less than 0.002. According to the literature, expression of genes from the ALDH1 superfamily in breast cancer correlates with the stage of the disease, triple-negative status, and response to neoadjuvant therapy. The upregulation of MFAP5 in invasive breast carcinomas has been associated with high risk prognostic features, such as higher tumor grade and stage and increased angiogenesis, and poorer outcomes, such as lymph node metastasis. Furthermore, MXRA8 is involved in modulating the progression of human triple-negative breast cancer, likely through its influence on the interactions of tumor cells with their microenvironment. Conclusion: The present study predicts the expression of clinically relevant genes in IBC using a heterogeneous set of pathomic features extracted from FFPE tissue slide imagery. By linking cell and tissue-based morphometric, intensity, and gradient features with gene expression, researchers can gain insights into the molecular mechanisms underlying disease progression. Digital pathology and pathomics may reduce the need for additional genetic testing in critical patient-cases by providing predictive information from routinely acquired pathology slides. Bridging the gap between phenotypic tissue data and molecular data, the predictive capabilities of pathomic features represent a significant advancement in the field of precision medicine. Citation Format: Shrey S. Sukhadia, Digvijay Yadav, Kristen E. Muller. Transformative pathomics in oncology: Harnessing FFPE tissue slide imagery for clinically relevant gene expression prediction in invasive breast carcinoma [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 2 (Late-Breaking, Clinical Trial, and Invited Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(7_Suppl):Abstract nr LB392.
Read full abstract