Abstract 3511: iQC: machine-learning-driven prediction of surgery reveals systematic confounds in cancer whole slide images from hospitals by protocol

Andrew J. Schaumberg,Michael S. Lewis,Ramin Nazarian,Ananta Wadhwa,Nathanael Kane,Graham Turner,Purushotham Karnam,Poornima Devineni,Nicholas Wolfe,Randall Kintner,Matthew B. Rettig,Beatrice S. Knudsen,Isla P. Garraway,Saiju Pyarajan

doi:10.1158/1538-7445.am2024-3511

Abstract

Abstract Problem: This 21st century has produced an eruption of research using AI for the detection and diagnosis of cancer. Yet, an often-unspoken core premise in this field of computational pathology is that a glass slide suitably represents the patient’s disease. Here, we report systematic confounds may dominate slides from a medical center, such that slides are unsuitable for diagnosis. Methods: We mathematically define high quality data as a whole slide image set where the patient’s surgery may be accurately predicted by an automated system. Our system “iQC” accurately distinguished biopsies (i.e. thin strands of tissue) from nonbiopsies, e.g. transurethral resections (TURPs) or prostatectomies, only when the data appeared high quality, e.g. bright histopathology stains and few artifacts. Thus, when the data are of high quality, iQC (i) accurately classifies pixels as tissue, (ii) accurately generates stats that describe the distribution of tissue, and (iii) accurately predicts surgical procedure from those stats. We compare iQC against the published HistoQC tool. Results: iQC holds all data to the same objective quality standard. We validate this standard in five Veterans Affairs Medical Centers (VAMCs) and the public Automated Gleason Grading Challenge (AGGC) dataset. For the surgery prediction task, we report an AUROC of 0.9966-1.000 at VAMCs that produced high quality data and AUROC=0.9824 for AGGC. In contrast, we report AUROC=0.7115 at the VAMC that produced poor quality data. A pathologist found poor quality may be explained by faded histopathology stains and VAMC protocol differences. Supporting this, iQC's novel stain strength statistic finds this VAMC had weaker stains (p &lt; 2.2e-16, two-tailed Wilcoxon rank-sum test; Cohen's d=1.208) than the VAMC that contributed most of the slides. Additionally, iQC recommended only 2 of 3736 (0.005%) VAMC slides for review due to inadequate tissue. In contrast, HistoQC in its default configuration excluded 89.9% of VAMC slides because tissue was not detected, but we reduced this to 16.7% with our custom HistoQC configuration. Conclusion: Our surgery prediction AUROC may be a quantitative indicator positively associated with data quality for a dataset. Unless data are poor quality, iQC accurately locates tissue in slides and excludes few slides. iQC is, to our knowledge, the first automated system in computational pathology that validates quality against objective evidence, e.g. surgical procedure data available in the EHR/LIMS, which requires no efforts or annotations from anatomic pathologists. Citation Format: Andrew J. Schaumberg, Michael S. Lewis, Ramin Nazarian, Ananta Wadhwa, Nathanael Kane, Graham Turner, Purushotham Karnam, Poornima Devineni, Nicholas Wolfe, Randall Kintner, Matthew B. Rettig, Beatrice S. Knudsen, Isla P. Garraway, Saiju Pyarajan. iQC: machine-learning-driven prediction of surgery reveals systematic confounds in cancer whole slide images from hospitals by protocol [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 3511.

Full Text