The Gleason Grade (GG) Group system has been introduced recently for more accurate stratification of prostate cancer (PCa). The grading system is based on the histologic patterns which is accessed from needle core biopsy, therefore it could be negatively impacted by the intratumor heterogeneity. We aim to develop a deep learning algorithm to predict GG groups using multiparametric magnetic resonance images (mp-MRI). We studied a retrospective collection of 201 patients with 320 lesions from the SPIE-AAPM-NCI PROSTATEx Challenge (https://doi.org/10.7937/K9TCIA.2017.MURS5CL), among which 98 patients with 110 lesions with GG available from biopsy. And the number of lesions in each subgroup was 36, 39, 20, 8, and 7, respectively, for GG 1-5. The images were acquired on two different types of Siemens 3T MR scanners. T2W images were acquired using a turbo spin echo sequence and had a resolution of around 0.5 mm in plane and a slice thickness of 3.6 mm. The DWI series were acquired with a single-shot echo planar imaging sequence with a resolution of 2 mm in-plane and 3.6 mm slice thickness and with diffusion-encoding gradients in three directions. Three b-values were acquired (50, 400, and 800 s/mm2), and subsequently, the ADC map was calculated by the scanner software. Image pre-processing included registration and normalization. Image rotation and scaling were also used to increase the sample size and re-balance the number of lesions in various GG. To prevent over-fitting on a small sample size, we implemented a transfer learning model by carrying over the features learned from the malignancy classification of 320 lesions from our previous model into the GG prediction. And we replaced the end-to-end convolutional neural network (CNN) training model with a combination of feature extraction using CNN and classification using weighted extreme learning machine (wELM). Features from the best performing model were extracted to represent each lesion, and those from the last convolutional layer were found constantly better than from all other layers. Based on 3-fold cross validation, the average validation results for sensitivity, specificity, positive predictive value, and negative predictive value for differentiation of each GG (1-5) were (1, 0.99, 0.97, 1), (0.69, 0.85, 0.73, 0.83), (0.9, 0.69, 0.46, 0.97), (0.89, 0.64, 0.16, 0.99), and (1, 0.78, 0.39, 1), respectively. GG4 had the highest false positive values. GG 3 was often misclassified as GG 4. Results of GG3-5 vs. GG1-2 were (0.82, 0.87, 0.76, 0.92). The stratification of GG4-5 vs. GG1-3 was (0.87, 0.81, 0.42, 0.98). This work has made substantial progress tackling the challenging task of GG prediction from mp-MRI due to a smaller and unbalanced data size by transferring knowledge from a malignancy classification task we developed earlier. The combined feature extraction using deep learning model and weighted extreme learning machine classifier has shown promising results for the GG prediction. This work was supported by a Research Scholar Grant, RSG-15-137-01-CCE from the American Cancer Society.
Read full abstract