345 Background: Gleason grading is the most potent prognostic variable in primary prostate cancer, however inter-observer variability remains a major issue, particularly where subspecialty-trained pathologists are not available. Artificial intelligence algorithms for prostate cancer grading may improve health care equity by ensuring widespread access to standardized, high quality grading, however most algorithms have not been tested for performance with respect to oncologic outcomes. Here, we compared deep learning-based and pathologist-based Gleason grading for prediction of metastatic outcome in three large radical prostatectomy cohorts. Methods: Three previously published radical prostatectomy cohorts from Johns Hopkins were utilized for this study. The Natural History Cohort (n=318, PMID: 26058959) and Case-Cohort (n=231, PMID: 36006048) both utilized a case-cohort design on the outcome of metastasis, while the Race Cohort (n=335, PMID: 30851334) was designed for a grade-matched comparison of self-identified White and Black patients. For each cohort, a single representative H&E-stained slide from the dominant tumor nodule was scanned and Gleason Grade Group (GG) assessed by a modified deep learning-based grading algorithm (PMID: 35233002), or by a subspecialty-trained pathologist. Harrell’s C-indices based on unadjusted Cox models for time to metastasis were compared for the pathologist- and deep learning-assigned GG in each cohort. Results: The C-index for deep learning-assigned GG in the Natural History Cohort was 0.757 (95% confidence interval [CI]: 0.703-0.811) compared to 0.748 (95% CI: 0.692-0.804) for the pathologist-assigned GG. The C-index for deep learning-assigned GG in the Case-Cohort was 0.848 (95% CI: 0.727-0.969) compared to 0.843 (95% CI: 0.722-0.964) for the pathologist-assigned GG. The C-index for deep learning-assigned GG in the Race Cohort was 0.754 (95% CI: 0.697-0.810) compared to 0.780 (95% CI: 0.717-0.844) for the pathologist-assigned GG. In a combined analysis of all three cohorts, the C-index for the deep learning-assigned GG was 0.765 (95% CI: 0.725-0.805) compared to 0.772 (95% CI: 0.730-0.814) for the pathologist-assigned grade group. In the combined cohorts, the historical pathologist-assigned GG for the entire case had a C-index of 0.796 (95% CI: 0.757-0.836). Conclusions: Deep learning and subspecialty-trained pathologist grading were highly comparable for association with metastatic outcome across three large radical prostatectomy cohorts.
Read full abstract