Abstract

Abstract In early-stage hormone receptor-positive breast cancer, genomic risk scores identify patients who stand to benefit from up-front chemotherapy but introduce financial and logistical hurdles to care. We assembled a cohort of 5,244 patients with 11,671 corresponding whole-side images of breast tumors stained with hematoxylin and eosin. We developed a multimodal machine learning model to infer risk of distal metastatic recurrence from routine clinical data. Specifically, the model interprets text from the pathologist’s report using a large language model and uses self-supervised vision transformers to interpret the corresponding whole-slide image. Tensor fusion joins the modalities to infer Genomic Health’s Oncotype DX recurrence score. Inferred recurrence score from the multimodal model correlated with measured score with a concordance correlation coefficient of 0.64 (95% C.I. 0.59 - 0.69) in the withheld test set, compared to 0.55 (95% C.I. 0.49 - 0.61) and 0.56 (95% C.I. 0.52 - 0.60) for the linguistic and visual unimodal models, respectively. The multimodal model attains an area under the precision-recall curve (AUPRC) of 0.69 (AUROC=0.88) for identifying high-risk disease in the full-information setting (when images and pathology reports with quantitative hormone receptor status and grade are available) in a withheld test set, compared to AUPRC of 0.61 and 0.66 for the linguistic and visual models, respectively. By comparison, in the same full-information setting, the clinical nomogram introduced by Orucevic et al. in 2019 achieves an AUPRC of 0.48. We suggest the operating point at which precision is 94.4% and recall is 33.3%. Digitized whole-slide images of routine breast biopsies and their associated synoptic pathology reports contain much of the information necessary to stratify patients by risk of distal metastatic recurrence, when modeled appropriately. Our model could enable hospitals to rapidly triage the need for genomic risk testing, possibly precluding one third of orders without loss of accuracy. This helps allocate scarce resources for genomic tests and valuable weeks prior to beginning therapy while maintaining the standard of precision oncology. Citation Format: Kevin M. Boehm, Antonio Marra, Jorge S. Reis-Filho, Sarat Chandarlapaty, Fresia Pareja, Sohrab P. Shah. Multimodal modeling of digitized histopathology slides improves risk stratification in hormone receptor-positive breast cancer patients [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 890.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call