PurposeAI-based auto-segmentation models hold promise for enhanced efficiency and consistency in organ contouring for adaptive radiotherapy and radiotherapy planning. However, their performance on paediatric CT data and cross-scanner compatibility remains unclear. This study aims to evaluate the performance of AI-based auto-segmentation models trained on adult CT data when applied to paediatric datasets and explore the improvement in performance gained by including paediatric training data. It also examines their ability to accurately segment CT data acquired from different scanners. Methods and MaterialsUsing the nnU-Net framework, segmentation models were trained on datasets of adult, paediatric, and combined CT scans for seven pelvic/thoracic organs. Each model was trained on 290-300 cases per category and organ. Training datasets included a combination of clinical data and several open repositories. The study incorporated a database of 459 paediatric (0-16 years) CT scans and 950 adults (>18 years), ensuring all scans had human expert ground-truth contours of the selected organs. Performance was evaluated based on Dice similarity coefficients (DSC) of the model-generated contours. ResultsAI-models trained exclusively on adult data underperformed on paediatric data, especially for the 0-2 age group: mean DSC was below 0.5 for the bladder and spleen. The addition of paediatric training data demonstrated significant improvement for all age groups, achieving a mean DSC of above 0.85 for all organs in every age group. Larger organs like the liver and kidneys maintained consistent performance for all models across age groups. No significant difference emerged in the cross-scanner performance evaluation, suggesting robust cross-scanner generalization. ConclusionFor optimal segmentation across age groups, it is important to include paediatric data in the training of segmentation models. The successful cross-scanner generalization also supports the real-world clinical applicability of these AI models. This study emphasizes the significance of dataset diversity in training robust AI systems for medical image interpretation tasks.
Read full abstract