As multi-organ segmentation of CT images is crucial for clinical applications, most state-of-the-art models rely on a fully annotated dataset with strong supervision to achieve high accuracy for particular organs. However, these models have weak generalization when applied to various CT images due to the small scale and single source of the training data. To utilize existing partially labeled datasets to obtain segmentation containing more organs and with higher accuracy and robustness, we create a multi-task learning network called MFUnetr. By directly feeding a union of datasets, MFUnetr trains an encoder-decoder network on two tasks in parallel. The main task is to produce full organ segmentation using a specific training strategy. The auxiliary task is to segment labeled organs of each dataset using label priors. Additionally, we offer a new weighted combined loss function to optimize the model. Compared to the base model UNETR trained on the fully annotated dataset BTCV, our network model, utilizing a combination of three partially labeled datasets, achieved mean Dice on overlapping organs: spleen + 0.35 %, esophagus + 15.28 %, and aorta + 8.31 %. Importantly, without fine-tuning, the mean Dice calculated on 13 organs of BTCV remained + 1.91 % when all 15 organs were segmented. The experimental results show that our proposed method can effectively use existing large partially annotated datasets to alleviate the problem of data scarcity in multi-organ segmentation.
Read full abstract