Abstract

Rationale and ObjectivesPancreas segmentation accuracy at CT is critical for the identification of pancreatic pathologies and is essential for the development of imaging biomarkers. Our objective was to benchmark the performance of five high-performing pancreas segmentation models across multiple metrics stratified by scan and patient/pancreatic characteristics that may affect segmentation performance. Materials and MethodsIn this retrospective study, PubMed and ArXiv searches were conducted to identify pancreas segmentation models which were then evaluated on a set of annotated imaging datasets. Results (Dice score, Hausdorff distance [HD], average surface distance [ASD]) were stratified by contrast status and quartiles of peri-pancreatic attenuation (5mm region around pancreas). Multivariate regression was performed to identify imaging characteristics and biomarkers (n=9) that were significantly associated with Dice score. ResultsFive pancreas segmentation models were identified: Abdomen Atlas [AAUNet, AASwin, trained on 8448 scans], TotalSegmentator [TS, 1204 scans], nnUNetv1 [MSD-nnUNet, 282 scans], and a U-Net based model for predicting diabetes [DM-UNet, 427 scans]. These were evaluated on 352 CT scans (30 females, 25 males, age 58±7 years old [±1SD], 327 age unknown) from 2000-2023. Overall, TS, AAUNet, and AASwin were the best performers, Dice=80±11%, 79±16%, and 77±18% respectively (pairwise Sidak test not-significantly different). AASwin and MSD-nnUNet performed worse (for all metrics) on non-contrast scans (vs contrast, P<.001). The worst performer was DM-UNet (Dice=67±16%). All algorithms except TS showed lower Dice scores with increasing peri-pancreatic attenuation (P<.01). Multivariate regression showed non-contrast scans, (P<.001; MSD-nnUNet), smaller pancreatic length (P=.005, MSD-nnUNet), and height (P=.003, DM-UNet) were associated with lower Dice scores. ConclusionThe convolutional neural network-based models trained on a diverse set of scans performed best (TS, AAUnet, and AASwin). TS performed equivalently to AAUnet and AASwin with only 13% of the training set size (8488 vs 1204 scans). Though trained on the same dataset, a transformer network (AASwin) had poorer performance on non-contrast scans whereas its convolutional network counterpart (AAUNet) did not. This study highlights how aggregate assessment metrics of pancreatic segmentation algorithms seen in other literature are not enough to capture differential performance across common patient and scanning characteristics in clinical populations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.