You have accessJournal of UrologyProstate Cancer: Detection & Screening VI (PD56)1 Sep 2021PD56-03 EXTERNAL VALIDATION OF AN ARTIFICIAL INTELLIGENCE ALGORITHM FOR PROSTATE CANCER GLEASON GRADING AND TUMOR QUANTIFICATION Bogdana Schmidt, Hriday P. Bhambhvani, Richard E. Fan, Christian Kunder, Chia Sui Kao, John P. Higgins, Mirabela Rusu, and Geoffrey A. Sonn Bogdana SchmidtBogdana Schmidt More articles by this author , Hriday P. BhambhvaniHriday P. Bhambhvani More articles by this author , Richard E. FanRichard E. Fan More articles by this author , Christian KunderChristian Kunder More articles by this author , Chia Sui KaoChia Sui Kao More articles by this author , John P. HigginsJohn P. Higgins More articles by this author , Mirabela RusuMirabela Rusu More articles by this author , and Geoffrey A. SonnGeoffrey A. Sonn More articles by this author View All Author Informationhttps://doi.org/10.1097/JU.0000000000002090.03AboutPDF ToolsAdd to favoritesDownload CitationsTrack CitationsPermissionsReprints ShareFacebookLinked InTwitterEmail Abstract INTRODUCTION AND OBJECTIVE: Prostate cancer treatment relies on accurate Gleason grading. Gleason scoring has high inter- and intra-rater variability, even among high-volume uropathologists. Deep learning on digital pathology has the potential to address these problems. We aimed to externally validate a deep learning algorithm’s performance on prostate cancer identification and Gleason grading. METHODS: DeepDx Prostate (DeepBio, Seoul, South Korea) is an automated Gleason scoring system that was trained using 1133 prostate core needle biopsy images and validated on 700. We performed external validation using 150 whole mount prostatectomy specimens from which 500 (1mm2) tiles were created and evaluated by 2 uropathologists and the DeepDx algorithm to establish Gleason grade, amount of cancer, and percentage of Gleason pattern 4 and 5 in the tiles. The reference standard was established by consensus of two experienced uropathologists with a third expert to evaluate discordant cases. We defined the main metric as the agreement with the reference standard, measured using quadratic Cohen’s kappa (κ). RESULTS: The DeepDx algorithm achieved overall high agreement with the reference standard (κ 0.79, 95% CI 0.75 - 0.82). It performed well at clinical decision thresholds benign vs malignant, (κ 0.927), and clinically low risk (benign, GG1, or GG2) versus high risk (GG 3-5) disease (κ 0.858). In evaluating benign and GG1 vs GG 2-5, the algorithm had less agreement (κ 0.771). Of 83 tiles classified as GG1 by uropathologists, the algorithm upgraded 53 (64%) to GG2, but median pattern 4 area was 3.9% (IQR 0.0009, 28.703) in those cases. CONCLUSIONS: In this external validation, we found that the DeepDx algorithm had high agreement with expert uropathologists in cancer identification and grading, despite being trained with a different patient population and using biopsy cores instead of prostatectomy specimens. The speed and accuracy of deep learning-based systems has broad applications from allowing clinicians to better counsel patients to facilitating research requiring detailed annotation of datasets not feasible by human pathologists. Source of Funding: None © 2021 by American Urological Association Education and Research, Inc.FiguresReferencesRelatedDetails Volume 206Issue Supplement 3September 2021Page: e1004-e1004 Advertisement Copyright & Permissions© 2021 by American Urological Association Education and Research, Inc.MetricsAuthor Information Bogdana Schmidt More articles by this author Hriday P. Bhambhvani More articles by this author Richard E. Fan More articles by this author Christian Kunder More articles by this author Chia Sui Kao More articles by this author John P. Higgins More articles by this author Mirabela Rusu More articles by this author Geoffrey A. Sonn More articles by this author Expand All Advertisement Loading ...
Read full abstract