Deep-Learning Models for Abdominal CT Organ Segmentation in Children: Development and Validation in Internal and Heterogeneous Public Datasets.

Elanchezhian Somasundaram,Zachary Taylor,Vinicius V Alves,Lisa Qiu,Benjamin Fortson,Neeraja Mahalingam,Jonathan Dudley,Hailong Li,Samuel L Brady,Andrew T Trout,Jonathan R Dillman

doi:10.2214/ajr.24.30931

Abstract

Background: Deep-learning abdominal organ segmentation algorithms have shown excellent results in adults; validation in children is sparse. Objective: To develop and validate deep-learning models for liver, spleen, and pancreas segmentation on pediatric CT examinations. Methods: This retrospective study developed and validated deep-learning models for liver, spleen, and pancreas segmentation using 1731 CT examinations (1504 training, 221 testing), derived from three internal institutional pediatric (age ≤18) datasets (n=483) and three public datasets comprising pediatric and adult examinations with various pathologies (n=1248). Three deep-learning model architectures (SegResNet, DynUNet, and SwinUNETR) from the Medical Open Network for AI (MONAI) framework underwent training using native training (NT), relying solely on institutional datasets, and transfer learning (TL), incorporating pre-training on public datasets. For comparison, TotalSegmentator (TS), a publicly available segmentation model, was applied to test data without further training. Segmentation performance was evaluated using mean Dice similarity coefficient (DSC), with manual segmentations as reference. Results: For internal pediatric data, DSC for normal liver was 0.953 (TS), 0.964-0.965 (NT models), and 0.965-0.966 (TL models); normal spleen, 0.914 (TS), 0.942-0.945 (NT models), and 0.937-0.945 (TL models); normal pancreas, 0.733 (TS), 0.774-0.785 (NT models), and 0.775-0.786 (TL models); pancreas with pancreatitis, 0.703 (TS), 0.590-0.640 (NT models), and 0.667-0.711 (TL models). For public pediatric data, DSC for liver was 0.952 (TS), 0.876-0.908 (NT models), and 0.941-0.946 (TL models); spleen, 0.905 (TS), 0.771-0.827 (NT models), and 0.897-0.926 (TL models); pancreas, 0.700 (TS), 0.577-0.648 (NT models), and 0.693-0.736 (TL models). For public primarily adult data, DSC for liver was 0.991 (TS), 0.633-0.750 (NT models), and 0.926-0.952 (TL models); spleen, 0.983 (TS), 0.569-0.604 (NT models), and 0.923-0.947 (TL models); pancreas, 0.909 (TS), 0.148-0.241 (NT models), and 0.699-0.775 (TL models). DynUNet-TL was selected as the best-performing NT or TL model and was made available as an opensource MONAI bundle (https://github.com/cchmc-dll/pediatric_abdominal_segmentation_bundle.git). Conclusion: TL models trained on heterogeneous public datasets and fine-tuned using institutional pediatric data outperformed internal NT models and TotalSegmentator across internal and external pediatric test data. Segmentation performance was better in liver and spleen than in pancreas. Clinical Impact: The selected model may be used for various volumetry applications in pediatric imaging.

Full Text