BackgroundArtificial intelligence (AI) has been recently shown to improve clinical workflows and outcomes - yet its potential in pediatric surgery remains largely unexplored. This systematic review details the use of AI in pediatric surgery. MethodsNine medical databases were searched from inception until January 2023, identifying articles focused on AI in pediatric surgery. Two authors reviewed full texts of eligible articles. Studies were included if they were original investigations on the development, validation, or clinical application of AI models for pediatric health conditions primarily managed surgically. Studies were excluded if they were not peer-reviewed, were review articles, editorials, commentaries, or case reports, did not focus on pediatric surgical conditions, or did not employ at least one AI model. Extracted data included study characteristics, clinical specialty, AI method and algorithm type, AI model (algorithm) role and performance metrics, key results, interpretability, validation, and risk of bias using PROBAST and QUADAS-2. ResultsAuthors screened 8178 articles and included 112. Half of the studies (50%) reported predictive models (for adverse events [25%], surgical outcomes [16%] and survival [9%]), followed by diagnostic (29%) and decision support models (21%). Neural networks (44%) and ensemble learners (36%) were the most commonly used AI methods across application domains. The main pediatric surgical subspecialties represented across all models were general surgery (31%) and neurosurgery (25%). Forty-four percent of models were interpretable, and 6% were both interpretable and externally validated. Forty percent of models had a high risk of bias, and concerns over applicability were identified in 7%. ConclusionsWhile AI has wide potential clinical applications in pediatric surgery, very few published AI algorithms were externally validated, interpretable, and unbiased. Future research needs to focus on developing AI models which are prospectively validated and ultimately integrated into clinical workflows. Level of Evidence2A.
Read full abstract