Abstract

Convolutional neural networks (CNNs) have recently led to significant advances in automatic segmentations of anatomical structures in medical images, and a wide variety of network architectures are now available to the research community. For applications such as segmentation of the prostate in magnetic resonance images (MRI), the results of the PROMISE12 online algorithm evaluation platform have demonstrated differences between the best-performing segmentation algorithms in terms of numerical accuracy using standard metrics such as the Dice score and boundary distance. These small differences in the segmented regions/boundaries outputted by different algorithms may potentially have an unsubstantial impact on the results of downstream image analysis tasks, such as estimating organ volume and multimodal image registration, which inform clinical decisions. This impact has not been previously investigated. In this work, we quantified the accuracy of six different CNNs in segmenting the prostate in 3D patient T2-weighted MRI scans and compared the accuracy of organ volume estimation and MRI-ultrasound (US) registration errors using the prostate segmentations produced by different networks. Networks were trained and tested using a set of 232 patient MRIs with labels provided by experienced clinicians. A statistically significant difference was found among the Dice scores and boundary distances produced by these networks in a non-parametric analysis of variance (p<0.001 and p<0.001, respectively), where the following multiple comparison tests revealed that the statistically significant difference in segmentation errors were caused by at least one tested network. Gland volume errors (GVEs) and target registration errors (TREs) were then estimated using the CNN-generated segmentations. Interestingly, there was no statistical difference found in either GVEs or TREs among different networks, (p = 0.34 and p = 0.26, respectively). This result provides a real-world example that these networks with different segmentation performances may potentially provide indistinguishably adequate registration accuracies to assist prostate cancer imaging applications. We conclude by recommending that the differences in the accuracy of downstream image analysis tasks that make use of data output by automatic segmentation methods, such as CNNs, within a clinical pipeline should be taken into account when selecting between different network architectures, in addition to reporting the segmentation accuracy.

Highlights

  • Prostate cancer is the most commonly diagnosed noncutaneous cancer in men in many parts of the Western world and or without the aid of a computer-assisted magnetic resonance images (MRI)-ultrasound (US) fusion system (Robertson et al, 2013).Deep learning methods, especially supervised classification methods based on convolutional neural networks (CNNs), have been successful in the field of medical imaging for segmenting the anatomy of interest (Litjens et al, 2017)

  • Six recently-proposed CNNs were compared to segment prostate glands in MRIs

  • The segmentation performance in terms of the Dice similarity coefficient (DSC) region overlap measure and the boundary distance (BD) was quantified for 232 patient datasets with expert labels provided by experienced clinicians

Read more

Summary

Introduction

Especially supervised classification methods based on convolutional neural networks (CNNs), have been successful in the field of medical imaging for segmenting the anatomy of interest (Litjens et al, 2017) These networks have produced higher accuracies for automatic prostate segmentations from T2-weighted MRIs, compared with alternative segmentation approaches (Litjens et al, 2017). With all these variations of CNNs for prostate MRI segmentation, a direct quantitative comparison of different CNN architectures on a single large data set, especially those with open-source implementations (not a requirement for submitting to the Challenge) is important, but to date has not been available to our research community

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call