Abstract

Deep learning based convolutional neural networks (CNNs) for prostate cancer (PCa) risk stratification employ radiologist delineated regions of interest (ROIs) on MRI. These ROIs contain the reader’s interpretation of the region of PCa. Variations in reader annotations change the features that are extracted from the ROIs, which may in turn affect classification performance of CNNs. In this study, we sought to analyze the effect of variations in inter-reader delineations of PCa ROIs on training of CNNs with regards to distinguishing clinically significant (csPCa) and insignificant PCa (ciPCa). We employed 180 patient studies (n=274 lesions) from 3 cohorts who underwent 3T multi-parametric MRI followed by MRI-targeted biopsy and/or radical prostatectomy. ISUP Gleason grade groups (GGG) obtained from pathology were used to determine csPCa (GGG&ge;2) and ciPCa (GGG=1). 5 experienced radiologists, with over 5 years of experience in prostate imaging, delineated PCa ROIs on bi-parametric MRI (bpMRI including T2 weighted (T2W) and diffusion weighted (DWI) sequences) within the training set (n<sub>1</sub>=160 lesions) using targeted biopsy locations. Patches were extracted using the ROIs which were then used to train individual CNNs (N<sub>1</sub>-N<sub>5</sub>) using the SqueezeNet architecture. The average volume for readerdelineated ROIs used for training varied greatly, ranging between 1106 and 2107 mm across all readers. The resulting networks showed no significant difference in classification performance (AUC= 0.82 &plusmn; 0.02) indicating that they were relatively robust to inter-reader variations in ROI. These models were evaluated on independent test sets (n<sub>2</sub>=85 lesions, n<sub>3</sub>=29 lesions) where ROIs were obtained by co-registration of MRI with post-surgical pathology, unaffected by inter-reader variations in ROIs. Network performance across D<sub>2</sub> and D<sub>3</sub> was 0.80&plusmn;0.02 and 0.62 &plusmn; 0.03, respectively. The CNN predictions were moderately consistent, with ICC(2,1) scores across D<sub>2</sub> and D<sub>3</sub> being 0.74 and 0.54, respectively. Higher agreement in ROI overlap produced higher correlation in predictions on external test sets (R = 0.89, p &lt; 0.05). Furthermore, higher average ROI volume produced greater AUC scores on D<sub>3</sub>, indicating that comprehensive ROIs may provide more features for DL networks to use in classification. Inter-reader variations in ROIs on MRI may influence the reliability and generalizability of CNNs trained for PCa risk stratification.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.