Evaluation of automatic discrimination between benign and malignant prostate tissue in the era of high precision digital pathology

Yauheniya Zhdanovich,Katharina Filipski,Henning Reis,Mike Wenzel,Claudia Döring,Simon Bernatz,Ina Koch,Philipp Mandel,Nadine Flinner,Peter J Wild,Thomas J Vogl,Jens Köllermann,Patrick Harter,Katrin Bankov,Jörg Ackermann,Benedikt Höh

doi:10.1186/s12859-022-05124-9

Abstract

BackgroundProstate cancer is a major health concern in aging men. Paralleling an aging society, prostate cancer prevalence increases emphasizing the need for efficient diagnostic algorithms.MethodsRetrospectively, 106 prostate tissue samples from 48 patients (mean age, 66pm 6.6 years) were included in the study. Patients suffered from prostate cancer (n = 38) or benign prostatic hyperplasia (n = 10) and were treated with radical prostatectomy or Holmium laser enucleation of the prostate, respectively. We constructed tissue microarrays (TMAs) comprising representative malignant (n = 38) and benign (n = 68) tissue cores. TMAs were processed to histological slides, stained, digitized and assessed for the applicability of machine learning strategies and open–source tools in diagnosis of prostate cancer. We applied the software QuPath to extract features for shape, stain intensity, and texture of TMA cores for three stainings, H&E, ERG, and PIN-4. Three machine learning algorithms, neural network (NN), support vector machines (SVM), and random forest (RF), were trained and cross-validated with 100 Monte Carlo random splits into 70% training set and 30% test set. We determined AUC values for single color channels, with and without optimization of hyperparameters by exhaustive grid search. We applied recursive feature elimination to feature sets of multiple color transforms.ResultsMean AUC was above 0.80. PIN-4 stainings yielded higher AUC than H&E and ERG. For PIN-4 with the color transform saturation, NN, RF, and SVM revealed AUC of 0.93pm 0.04, 0.91pm 0.06, and 0.92pm 0.05, respectively. Optimization of hyperparameters improved the AUC only slightly by 0.01. For H&E, feature selection resulted in no increase of AUC but to an increase of 0.02–0.06 for ERG and PIN-4.ConclusionsAutomated pipelines may be able to discriminate with high accuracy between malignant and benign tissue. We found PIN-4 staining best suited for classification. Further bioinformatic analysis of larger data sets would be crucial to evaluate the reliability of automated classification methods for clinical practice and to evaluate potential discrimination of aggressiveness of cancer to pave the way to automatic precision medicine.

Full Text