Abstract
Numerous algorithms are available for segmenting medical images. Empirical discrepancy metrics are commonly used in measuring the similarity or difference between segmentations by algorithms and "true" segmentations. However, one issue with the commonly used metrics is that the same metric value often represents different levels of "clinical acceptability" for different objects depending on their size, shape, and complexity of form. An ideal segmentation evaluation metric should be able to reflect degrees of acceptability directly from metric values and be able to show the same acceptability meaning by the same metric value for objects of different shape, size, and form. Intuitively, metrics which have a linear relationship with degree of acceptability will satisfy these conditions of the ideal metric. This issue has not been addressed in the medical image segmentation literature. In this paper, we propose a method called LinSEM for linearizing commonly used segmentation evaluation metrics based on corresponding degrees of acceptability evaluated by an expert in a reader study. LinSEM consists of two main parts: (a) estimating the relationship between metric values and degrees of acceptability separately for each considered metric and object, and (b) linearizing any given metric value corresponding to a given segmentation of an object based on the estimated relationship. Since algorithmic segmentations do not usually cover the full range of variability of acceptability, we create a set (SS) of simulated segmentations for each object that guarantee such coverage by using image transformations applied to a set (ST) of true segmentations of the object. We then conduct a reader study wherein the reader assigns an acceptability score (AS) for each sample in SS, expressing the acceptability of the sample on a 1 to 5 scale. Then the metric-AS relationship is constructed for the object by using an estimation method. With the idea that the ideal metric should be linear with respect to acceptability, we can then linearize the metric value of any segmentation sample of the object from a set (SA) of actual segmentations to its linearized value by using the constructed metric-acceptability relationship curve. Experiments are conducted involving three metrics - Dice coefficient (DC), Jaccard index (JI), and Hausdorff Distance (HD) - on five objects: skin outer boundary of the head and neck (cervico-thoracic) body region superior to the shoulders, right parotid gland, mandible, cervical esophagus, and heart. Actual segmentations (SA) of these objects are generated via our Automatic Anatomy Recognition (AAR) method. Our results indicate that, generally, JI has a more linear relationship with acceptability before linearization than other metrics. LinSEM achieves significantly improved uniformity of meaning post-linearization across all tested objects and metrics, except in a few cases where the departure from linearity was insignificant. This improvement is generally the largest for DC and HD reaching 8-25% for many tested cases. Although some objects (such as right parotid gland and esophagus for DC and JI) are close in their meaning between themselves before linearization, they are distant in this meaning from other objects but are brought close to other objects after linearization. This suggests the importance of performing linearization considering all objects in a body region and body-wide.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.