Abstract Introduction: Breast cancer is the most common cancer in women, with most patients diagnosed with early-stage disease. While adjuvant therapy should be considered in all fit patients, in many low-risk patients drug toxicity may out-weigh the potential benefit, therefore, identifying these patients is desired. Hematoxylin and eosin (H&E) is the basic staining routinely performed for biopsies. It allows visual examination of the tissue and cells. Such manual examination, however, does not provide information about the molecular profile of the cancer, which is essential for diagnosis and guidance of treatment. The OncotypeDX assay is recommended for patients with node-negative, estrogen receptor-positive invasive breast cancer for determining chemotherapy benefit. It is based on RT-PCR gene expression profiling and provides a recurrence score (RS) that enables patient stratification to non-high risk (RS < 26) and high risk (RS ≥ 26). It was shown that high risk patients are likely to benefit from chemotherapy, while non-high risk are not. Unlike H&E staining, the OncotypeDX assay is costly, time consuming, and inaccessible in low-income countries. Here, we sought to evaluate whether analysis of scanned H&E-stained slides by convolutional neural networks (CNNs) could predict the RS risk group (high versus non-high), which determines eligibility for chemotherapy. Methods: 684 H&E-stained slides were collected from 430 invasive breast cancer patients who were assayed for OncotypeDX between 2014 and 2020 at Sheba medical center, Israel. The slides were scanned at 0.25 micron/pixel, and automatically segmented and split to 256 × 256 non-overlapping tiles, resulting in overall 339,986 tile images containing tissue. The patients were randomly split into training (75%) and test (25%) sets, and a CNN model was trained and validated on the training set to classify each tile to non-high risk (RS < 26) versus high risk (RS ≥ 26), in 5-fold cross-validation. The final model was then applied to the held-out test set, and tile scores were aggregated to produce per-patient prediction scores. The final CNN prediction scores on the test set were compared to the ground truth risk group and the AUC performance was calculated. Results: The AUC performance of the model on the held out test set for high-risk versus non-high-risk classification based on the H&E images alone was high (0.798, 95% CI: 0.689 - 0.875, P value < 0.001), showing that the H&E image analysis could predict the high risk group. Conclusions: These results show, for the first time, that CNN-based analysis of H&E images could predict benefit from chemotherapy, thus implying distinct tumor morphologies differing between RS groups. Utilizing such a system may enable physicians in countries that lack genetic profiling capabilities to refine chemotherapy stratification based on H&E images alone. Citation Format: Gil Shamai, Ran Schley, Ron Kimmel, Nora Balint-Lahat, Iris Barshack, Chen Mayer. Prediction of OncotypeDX high risk group for chemotherapy benefit in breast cancer by deep learning analysis of hematoxylin and eosin-stained whole slide images. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 5354.
Read full abstract