AI-based HER2-low IHC scoring in breast cancer across multiple sites, clones, and scanners.

Patrick Frey,Carolin Schmidt,Stefan Günther,Niklas Abele,Ramona Erber,Evgeny Minin,Ralf Banisch,Arndt Hartmann,Peter A Fasching,Tobias Lang,Andreas Mamilos,Hanna Huebner,Matthias Rübner

doi:10.1200/jco.2023.41.16_suppl.516

Abstract

516 Background: Assessment of immunohistochemical (IHC) HER2 expression plays a pivotal role in breast cancer diagnostics. In the era of HER2-low and HER2-targeted antibody drug conjugates, accurate discrimination of defined HER2 IHC scores is essential. At the same time, HER2 IHC scoring suffers from poor interobserver concordance. Artificial intelligence (AI) may optimize this scoring in regard to standardization, accuracy and efficiency, but previous approaches fail to show the required consistency across samples from different sites, clones and scanning hardware. Methods: We have investigated the use of an AI-based HER2 IHC quantifier software to support pathologists in standardized HER2 IHC assessment in breast cancer. Validation specimens were derived from four institutions and five scanners. Using the “region of interest” (ROI) software version (part I), pathologists choose the ROI to be assessed within the whole slide image (WSI). In contrast, the fully automatic version (part II) analyzes the complete WSI. Part I: Three pathologists selected one ROI per slide from a cohort of n = 150 specimens. They scored HER2 expression in these ROIs (path-only) according to ASCO/CAP 2018 guidelines (each pathologist n = 50). After a 2-week washout period, the same pathologists were presented with the same ROIs and corresponding AI-suggested results (AI-only), and then decided on final HER2 scores (AI-assisted). Scoring times were recorded. Part II: Fully automatic AI accuracy without human intervention was analyzed using the WSI cohort of part I and an additional cohort of n = 94 WSIs. For both parts, IHC scores were compared to the clinical workflow derived ground truth defined as the manually assessed HER2 IHC score. Results: Part I: In discriminating HER2-neg from HER2-low/pos cases, AI-assisted and AI-only ROI scoring showed agreement rates of 91.3% and 86.7%, respectively, with path-only decisions across all institutions and scanners. In discriminating the four HER2 scores (0, 1+, 2+, 3+) individually, interrater-agreement of AI-assisted vs. path-only ROI HER2 scoring was 78.7%, exceeding literature rates of < 70%, with the mean scoring time per ROI being 29 sec vs. 50 sec, respectively; interrater-agreement of AI-only vs. AI-assisted was 85.3%. Part II: In discriminating HER2-neg from HER2-low/pos cases, fully automatic AI WSI scoring showed 89.1%/86.2% agreement for both cohorts, respectively. Conclusions: Across challenging validation data from four institutions and five scanners, scoring with the support of an AI HER2 IHC quantifier software showed very high agreement when discriminating HER2-neg from HER2-low/pos cases and high accuracy for general HER2 scoring. When using AI-assistance, scoring time was reduced by almost 50%. Altogether, these results demonstrate the potential of AI solutions to increase consistency and efficiency of HER2 scoring and ultimately to improve patient outcome.

Full Text