Abstract P1-02-17: Artificial intelligence-based whole slide scoring of nuclear breast cancer IHC markers Ki67, ER, and PR matches performance of manual clinical scoring

Monesh Kapadia,Margarita Kouzova,Ipshita Bhattacharya,Landon J Inge,Christoph Guetter,Jim Ranger-Moore,Nancy Sapanara,Mehrnoush Khojasteh,Margaret Zhao,Isaac Bai,Shalini Singh,Sarah Gladden,Chandana Chintakindi,Matthew T Olson,Uday Kurkure,Xiaomeng Xu ,Karel J Zuiderveld ,Carol Jones ,Bryan Lopez ,Chen Chun Chen

doi:10.1158/1538-7445.sabcs21-p1-02-17

Abstract

Abstract Background: 2.1 million breast cancers are newly diagnosed each year. Current guidelines endorse routine testing for estrogen receptor (ER) and progesterone receptor (PR), while use of the Ki67 biomarker can provide additional prognostic value. All three biomarkers currently require quantitative evaluation using manual review of a glass slide, resulting in reproducibility issues across labs due to interpretative and scoring variabilities. Current on-market image analysis algorithms only offer limited field-of-view (FOV) that analyze a tiny fraction of the entire tissue. Whole slide image (WSI) analysis, in comparison, analyzes the entire tissue and, therefore, more closely mimics how pathologists assess these slides in clinical practice. In this work, we developed three deep learning artificial intelligence (AI) based algorithms for WSI analysis (IA) of digitized images from Ki67, ER, and PR stained slides that address these variabilities and allow pathologists across labs to consistently score at the same accuracy as selected expert labs. The complete software solution delivers high throughput analyzing whole-slide images in less than 2 minutes during pre-computation on conventional computer hardware and returning results on user-provided annotations in milliseconds. Methods: We assembled a benchmark validation data set of 312 breast cancer cases (100 Ki67, 102 ER, and 110 PR slides, stained at multiple sites) representative of breast cancer subtypes (i.e. ductal, lobular, mucinous, medullary, tubular), score (i.e. 0%-100% positivity), tumor grade (i.e. well, moderately, and poorly differentiated), and specimen type (i.e. biopsy and resection). Three pre-clinical validation studies were performed using the Roche uPath enterprise software and each of the ER, PR and Ki67 image analysis algorithms. A total of 6 pathologists participated in the study split into expert (n=3) and study (n=3) readers. A non-inferiority Ground-Truth (GT) study design was implemented in which the study and expert readers performed manual read (MR) followed by AI-assisted scoring. The expert manual scores were used as GT to which the readers’ manual and AI scores were compared for each marker and case. Results: The overall concordance rates between AI scores and expert GT was as follows: For Ki67, OPA=97.2% (95% CI: 94.0, 99.7), NPA=97.8% (95% CI: 93.4,100), and PPA=96.7% (95% CI: 91.3, 100), for ER, OPA=95.4% (CI:91.4,98.4), NPA=96.4% (CI:92.5,99.4), and PPA=94.4% (CI:87.4,100), and for PR, OPA=96.1% (95% CI:92.7,99.1), NPA=96.7% (95% CI:92.5,100), and PPA=95.6% (95% CI:89.9,100). The differences between AI and MR overall concordance rates (AI-MR) when compared to the expert GT were: for Ki67: OPA-diff=1.4% (2-sided 95% CI:-0.7,3.7), NPA-diff=3.8% (CI:0.6,7.8), PPA-diff=-1.0% (CI:-3.5,0.0), for ER: OPA-diff=-0.9% (CI:-3.3,1.0), NPA-diff=-0.1% (CI:-3.1,3.0), PPA-diff=-1.8% (CI:-6.2,0.0), and for PR: OPA-diff=-1.5% (CI:-3.9,0.6), NPA-diff=-2.4% (CI:-6.8,1.2), PPA-diff= -0.7%(CI:-2.8,1.1) using the cutoffs 20% (Ki67), 1% (ER), and 1% (PR) respectively. Conclusion: Our preliminary feasibility data shows that pathologists using WSI analysis assisted scoring was equivalent to manual scoring and an expert panel GT using a truly representative benchmark data set. Additionally, image analysis algorithms are known to provide high reproducibility and precision. We will provide those numbers at a later stage as they were not fully available at time of submission. Our results show the value and potential of deep learning technologies to improve the diagnosis and care of patients with breast cancer. Citation Format: Monesh Kapadia, Mehrnoush Khojasteh, Margarita Kouzova, Carol Jones, Xiao-Meng Xu, Matthew T. Olson, Sarah Gladden, Nancy Sapanara, Shalini Singh, Chen Chun Chen, Isaac Bai, Jim Ranger-Moore, Landon J. Inge, Uday Kurkure, Ipshita Bhattacharya, Margaret Zhao, Karel Zuiderveld, Chandana Chintakindi, Bryan Lopez, Christoph Guetter. Artificial intelligence-based whole slide scoring of nuclear breast cancer IHC markers Ki67, ER, and PR matches performance of manual clinical scoring [abstract]. In: Proceedings of the 2021 San Antonio Breast Cancer Symposium; 2021 Dec 7-10; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2022;82(4 Suppl):Abstract nr P1-02-17.

Full Text