BACKGROUND. Deep learning abdominal organ segmentation algorithms have shown excellent results in adults; validation in children is sparse. OBJECTIVE. The purpose of this article is to develop and validate deep learning models for liver, spleen, and pancreas segmentation on pediatric CT examinations. METHODS. This retrospective study developed and validated deep learning models for liver, spleen, and pancreas segmentation using 1731 CT examinations (1504 training, 221 testing), derived from three internal institutional pediatric (age ≤ 18 years) datasets (n = 483) and three public datasets comprising pediatric and adult examinations with various pathologies (n = 1248). Three deep learning model architectures (SegResNet, DynUNet, and SwinUNETR) from the Medical Open Network for Artificial Intelligence (MONAI) framework underwent training using native training (NT), relying solely on institutional datasets, and transfer learning (TL), incorporating pretraining on public datasets. For comparison, TotalSegmentator, a publicly available segmentation model, was applied to test data without further training. Segmentation performance was evaluated using mean Dice similarity coefficient (DSC), with manual segmentations as reference. RESULTS. For internal pediatric data, the DSC for TotalSegmentator, NT models, and TL models for normal liver was 0.953, 0.964-0.965, and 0.965-0.966, respectively; for normal spleen, 0.914, 0.942-0.945, and 0.937-0.945; for normal pancreas, 0.733, 0.774-0.785, and 0.775-0.786; and for pancreas with pancreatitis, 0.703, 0.590-0.640, and 0.667-0.711. For public pediatric data, the DSC for TotalSegmentator, NT models, and TL models for liver was 0.952, 0.871-0.908, and 0.941-0.946, respectively; for spleen, 0.905, 0.771-0.827, and 0.897-0.926; and for pancreas, 0.700, 0.577-0.648, and 0.693-0.736. For public primarily adult data, the DSC for TotalSegmentator, NT models, and TL models for liver was 0.991, 0.633-0.750, and 0.926-0.952, respectively; for spleen, 0.983, 0.569-0.604, and 0.923-0.947; and for pancreas, 0.909, 0.148-0.241, and 0.699-0.775. The DynUNet TL model was selected as the best-performing NT or TL model considering DSC values across organs and test datasets and was made available as an open-source MONAI bundle (https://github.com/cchmc-dll/pediatric_abdominal_segmentation_bundle.git). CONCLUSION. TL models trained on heterogeneous public datasets and fine-tuned using institutional pediatric data outperformed internal NT models and Total-Segmentator across internal and external pediatric test data. Segmentation performance was better in liver and spleen than in pancreas. CLINICAL IMPACT. The selected model may be used for various volumetry applications in pediatric imaging.
Read full abstract