Label fusion and training methods for reliable representation of inter-rater uncertainty

Andreanne Lemay,Enamundram Naga Karthik,Julien Cohen-Adad,Charley Gros

doi:10.59275/j.melba.2022-db5c

Abstract

Medical tasks are prone to inter-rater variability due to multiple factors such as image quality, professional experience and training, or guideline clarity. Training deep learning networks with annotations from multiple raters is a common practice that mitigates the model’s bias towards a single expert. Reliable models generating calibrated outputs and reflecting the inter-rater disagreement are key to the integration of artificial intelligence in clinical practice. Various methods exist to take into account different expert labels. We focus on comparing three label fusion methods: STAPLE, average of the rater’s segmentation, and random sampling of each rater’s segmentation during training. Each label fusion method is studied using both the conventional training framework and the recently published SoftSeg framework that limits information loss by treating the segmentation task as a regression. Our results, across 10 data splittings on two public datasets (spinal cord gray matter challenge, and multiple sclerosis brain lesion segmentation), indicate that SoftSeg models, regardless of the ground truth fusion method, had better calibration and preservation of the inter-rater rater variability compared with their conventional counterparts without impacting the segmentation performance. Conventional models, i.e., trained with a Dice loss, with binary inputs, and sigmoid/softmax final activate, were overconfident and underestimated the uncertainty associated with inter-rater variability. Conversely, fusing labels by averaging with the SoftSeg framework led to underconfident outputs and overestimation of the rater disagreement. In terms of segmentation performance, the best label fusion method was different for the two datasets studied, indicating this parameter might be task-dependent. However, SoftSeg had segmentation performance systematically superior or equal to the conventionally trained models and had the best calibration and preservation of the inter-rater variability. SoftSeg has a low computational cost and performed similarly in terms of uncertainty to ensembles which require multiple models and forward passes. Our code is available at <href='https://ivadomed.org'>https://ivadomed.org</a>.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Label fusion and training methods for reliable representation of inter-rater uncertainty

Abstract

Talk to us

Similar Papers

More From: Machine Learning for Biomedical Imaging

Lead the way for us

Journal: Machine Learning for Biomedical Imaging	Publication Date: Jan 18, 2023
Citations: 2

Similar Papers

Reliability-based robust multi-atlas label fusion for brain MRI segmentation.
Liang Sun ... Junye Guang
Artificial Intelligence in Medicine | VOL. 96
Liang Sun, et. al.Liang Sun ... Junye Guang
08 Mar 2019
Artificial Intelligence in Medicine | VOL. 96

An improved label fusion approach with sparse patch‐based representation for MRI brain image segmentation
Meng Yan ... Renchao Jin
International Journal of Imaging Systems and Technology | VOL. 27
Meng Yan, et. al.Meng Yan ... Renchao Jin
01 Mar 2017
International Journal of Imaging Systems and Technology | VOL. 27

Hierarchical multi-atlas label fusion with multi-scale feature representation and label-specific patch partition
Guorong Wu ... Dinggang Shen
NeuroImage | VOL. 106
Guorong Wu, et. al.Guorong Wu ... Dinggang Shen
20 Nov 2014
NeuroImage | VOL. 106

Multi-atlas label fusion using hybrid of discriminative and generative classifiers for segmentation of cardiac MR images.
Suman Sedai ... Xi Liang
Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference | VOL. 2015
Suman Sedai, et. al.Suman Sedai ... Xi Liang
01 Aug 2015
01 Aug 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Label fusion and training methods for reliable representation of inter-rater uncertainty

Abstract

Talk to us

Similar Papers

More From: Machine Learning for Biomedical Imaging