Medical image auto-segmentation techniques are basic and critical for numerous image-based analysis applications that play an important role in developing advanced and personalized medicine. Compared with manual segmentations, auto-segmentations are expected to contribute to a more efficient clinical routine and workflow by requiring fewer human interventions or revisions to auto-segmentations. However, current auto-segmentation methods are usually developed with the help of some popular segmentation metrics that do not directly consider human correction behavior. Dice Coefficient (DC) focuses on the truly-segmented areas, while Hausdorff Distance (HD) only measures the maximal distance between the auto-segmentation boundary with the ground truth boundary. Boundary length-based metrics such as surface DC (surDC) and Added Path Length (APL) try to distinguish truly-predicted boundary pixels and wrong ones. It is uncertain if these metrics can reliably indicate the required manual mending effort for application in segmentation research. Therefore, in this paper, the potential use of the above four metrics, as well as a novel metric called Mendability Index (MI), to predict the human correction effort is studied with linear and support vector regression models. 265 3D computed tomography (CT) samples for 3 objects of interest from 3 institutions with corresponding auto-segmentations and ground truth segmentations are utilized to train and test the prediction models. The five-fold cross-validation experiments demonstrate that meaningful human effort prediction can be achieved using segmentation metrics with varying prediction errors for different objects. The improved variant of MI, called MIhd, generally shows the best prediction performance, suggesting its potential to indicate reliably the clinical value of auto-segmentations.
Read full abstract