BackgroundFully automatic medical image segmentation has been a long pursuit in radiotherapy (RT). Recent developments involving deep learning show promising results yielding consistent and time efficient contours. In order to train and validate these systems, several geometric based metrics, such as Dice Similarity Coefficient (DSC), Hausdorff, and other related metrics are currently the standard in automated medical image segmentation challenges. However, the relevance of these metrics in RT is questionable. The quality of automated segmentation results needs to reflect clinical relevant treatment outcomes, such as dosimetry and related tumor control and toxicity. In this study, we present results investigating the correlation between popular geometric segmentation metrics and dose parameters for Organs-At-Risk (OAR) in brain tumor patients, and investigate properties that might be predictive for dose changes in brain radiotherapy. MethodsA retrospective database of glioblastoma multiforme patients was stratified for planning difficulty, from which 12 cases were selected and reference sets of OARs and radiation targets were defined. In order to assess the relation between segmentation quality -as measured by standard segmentation assessment metrics- and quality of RT plans, clinically realistic, yet alternative contours for each OAR of the selected cases were obtained through three methods: (i) Manual contours by two additional human raters. (ii) Realistic manual manipulations of reference contours. (iii) Through deep learning based segmentation results. On the reference structure set a reference plan was generated that was re-optimized for each corresponding alternative contour set. The correlation between segmentation metrics, and dosimetric changes was obtained and analyzed for each OAR, by means of the mean dose and maximum dose to 1% of the volume (Dmax 1%). Furthermore, we conducted specific experiments to investigate the dosimetric effect of alternative OAR contours with respect to the proximity to the target, size, particular shape and relative location to the target. ResultsWe found a low correlation between the DSC, reflecting the alternative OAR contours, and dosimetric changes. The Pearson correlation coefficient between the mean OAR dose effect and the Dice was -0.11. For Dmax 1%, we found a correlation of -0.13. Similar low correlations were found for 22 other segmentation metrics. The organ based analysis showed that there is a better correlation for the larger OARs (i.e. brainstem and eyes) as for the smaller OARs (i.e. optic nerves and chiasm). Furthermore, we found that proximity to the target does not make contour variations more susceptible to the dose effect. However, the direction of the contour variation with respect to the relative location of the target seems to have a strong correlation with the dose effect. ConclusionsThis study shows a low correlation between segmentation metrics and dosimetric changes for OARs in brain tumor patients. Results suggest that the current metrics for image segmentation in RT, as well as deep learning systems employing such metrics, need to be revisited towards clinically oriented metrics that better reflect how segmentation quality affects dose distribution and related tumor control and toxicity.