Developing an explainable machine learning model to predict false-negative citrin deficiency cases in newborn screening

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

BackgroundNeonatal Intrahepatic Cholestasis caused by Citrin Deficiency (NICCD) is an autosomal recessive disorder affecting the urea cycle and energy metabolism. Newborn screening (NBS) usually relies on elevated citrulline, but some patients have normal citrulline, resulting in false negatives and delayed diagnosis. This study develops an explainable machine learning (ML) model to predict false-negative NICCD cases during NBS.MethodsData from 53 false-negative NICCD patients and 212 controls, collected retrospectively between 2011 and 2024, were analyzed. The dataset was split into a training set (70%) and a test set (30%). External validation involved 48 participants from distinct time periods. Key predictors were identified using variable importance in projection (VIP > 1) and Lasso regression. Six ML models were trained for evaluation: Logistic Regression, Random Forest, Light Gradient Boosting Machine, Extreme Gradient Boosting (XGBoost), K-Nearest Neighbor, and Support Vector Machines. Performance was evaluated using the area under the receiver operating characteristic curve (AUC) and F1 score. Shapley Additive exPlanations (SHAP) was applied to determine the importance of features and interpret the models.ResultsBirth weight, citrulline, glycine, phenylalanine, ornithine, arginine, proline, succinylacetone, and C10:2 were selected as predictive features. Among the ML models, XGBoost demonstrated the most robust and consistent performance, achieving AUCs of 0.971(95%CI: 0.959–0.979), 0.968, and 0.977, and F1 scores of 0.786(95% CI: 0.744–0.820), 0.828, and 0.833 in the training, test, and external validation sets, respectively. SHAP analysis showed that the most important features are citrulline, glycine, phenylalanine, succinylacetone, birth weight, and ornithine. Feature pairs such as citrulline-phenylalanine, citrulline-glycine, succinylacetone-birth weight, and ornithine-glycine showed varying interactions. SHAP force plots, decision plots, and waterfall plots provided insightful patient-level interpretations. Finally, we built a network calculator for the prediction of false-negative NICCD cases (https://myapp123.shinyapps.io/my_shiny_app/).ConclusionAn interpretable machine learning model utilizing metabolite and demographic data enhances the detection of false-negative NICCD cases, facilitates early identification and intervention, and ultimately improves the overall effectiveness of the newborn screening system.Supplementary InformationThe online version contains supplementary material available at 10.1186/s13023-025-04045-z.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 14
  • 10.1186/s12911-023-02166-8
The prediction of distant metastasis risk for male breast cancer patients based on an interpretable machine learning model
  • Apr 21, 2023
  • BMC Medical Informatics and Decision Making
  • Xuhai Zhao + 1 more

ObjectivesThis research was designed to compare the ability of different machine learning (ML) models and nomogram to predict distant metastasis in male breast cancer (MBC) patients and to interpret the optimal ML model by SHapley Additive exPlanations (SHAP) framework.MethodsFour powerful ML models were developed using data from male breast cancer (MBC) patients in the SEER database between 2010 and 2015 and MBC patients from our hospital between 2010 and 2020. The area under curve (AUC) and Brier score were used to assess the capacity of different models. The Delong test was applied to compare the performance of the models. Univariable and multivariable analysis were conducted using logistic regression.ResultsOf 2351 patients were analyzed; 168 (7.1%) had distant metastasis (M1); 117 (5.0%) had bone metastasis, and 71 (3.0%) had lung metastasis. The median age at diagnosis is 68.0 years old. Most patients did not receive radiotherapy (1723, 73.3%) or chemotherapy (1447, 61.5%). The XGB model was the best ML model for predicting M1 in MBC patients. It showed the largest AUC value in the tenfold cross validation (AUC:0.884; SD:0.02), training (AUC:0.907; 95% CI: 0.899—0.917), testing (AUC:0.827; 95% CI: 0.802—0.857) and external validation (AUC:0.754; 95% CI: 0.739—0.771) sets. It also showed powerful ability in the prediction of bone metastasis (AUC: 0.880, 95% CI: 0.856—0.903 in the training set; AUC: 0.823, 95% CI:0.790—0.848 in the test set; AUC: 0.747, 95% CI: 0.727—0.764 in the external validation set) and lung metastasis (AUC: 0.906, 95% CI: 0.877—0.928 in training set; AUC: 0.859, 95% CI: 0.816—0.891 in the test set; AUC: 0.756, 95% CI: 0.732—0.777 in the external validation set). The AUC value of the XGB model was larger than that of nomogram in the training (0.907 vs 0.802) and external validation (0.754 vs 0.706) sets.ConclusionsThe XGB model is a better predictor of distant metastasis among MBC patients than other ML models and nomogram; furthermore, the XGB model is a powerful model for predicting bone and lung metastasis. Combining with SHAP values, it could help doctors intuitively understand the impact of each variable on outcome.

  • Research Article
  • 10.1111/joor.70108
An Interpretable Machine Learning Model Based on MRI Features for Predicting Pain Severity in Temporomandibular Disorders.
  • Nov 18, 2025
  • Journal of oral rehabilitation
  • Chuanfang Xu + 6 more

Chronic pain around the temporomandibular joint (TMJ) and masticatory muscles is a primary symptom of temporomandibular disorders (TMD). However, the clinical significance of magnetic resonance imaging (MRI) features in predicting TMD-related pain remains unclear. This study aimed to develop and interpret machine learning (ML) models based on MRI characteristics for predicting pain severity in patients with TMD. The present retrospective study included 584 patients with TMD between January 2022 and December 2024, yielding a total of 755 TMJ MRI data sets. Pain severity was classified using the visual analogue scale (VAS). Demographic variables (age, sex) and MRI features-including lesion side, disc position, disc morphology, disc signal, disc perforation, bilaminar zone tear, joint space, joint effusion, condylar movement, bony changes and morphology/signal of the lateral pterygoid muscle-were collected. Eleven ML models based on demographic and MRI features were developed: logistic regression (LR), support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), adaptive boosting (AdaBoost), gradient boosting classifier (GBC), bagging classifier (BC), extremely randomised trees (ETC), decision tree classifier (DTC) and multilayer perceptron (MLP). Model performance was evaluated using multiple metrics, including the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity and F1 score. Precision-recall (PR) curves and calibration curves were plotted to assess discrimination and model calibration. Decision curve analysis (DCA) was conducted to evaluate the clinical net benefit across a range of threshold probabilities. Model interpretability was enhanced using Shapley Additive Explanations (SHAP), which quantified the contribution of each feature to individual predictions. Feature selection was conducted based on mean SHAP values, and separate LightGBM models were constructed using the Top 3, 5, and 9 most important features, as well as the full-feature set, for performance comparison. The data set was randomly divided into a training set (n = 604) and a test set (n = 151). Among the 11 ML models, the LightGBM model demonstrated the best predictive performance, with an AUC of 0.899, and was therefore identified as the optimal model. SHAP analysis identified age, disc position and condylar movement as the top three contributing features. Feature selection analysis indicated that selecting the top nine SHAP-ranked variables led to the highest diagnostic performance, with an AUC of 0.829. This study developed an interpretable, high-performing MRI-based ML model incorporating SHAP analysis to integrate imaging and clinical features for objective pain assessment, which may help identify high-risk TMD patients and guide personalised treatment strategies.

  • Research Article
  • Cite Count Icon 5
  • 10.2196/66733
Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study
  • May 26, 2025
  • Journal of Medical Internet Research
  • Jingchao Lei + 4 more

BackgroundSepsis-associated liver injury (SALI) is a severe complication of sepsis that contributes to increased mortality and morbidity. Early identification of SALI can improve patient outcomes; however, sepsis heterogeneity makes timely diagnosis challenging. Traditional diagnostic tools are often limited, and machine learning techniques offer promising solutions for predicting adverse outcomes in patients with sepsis.ObjectiveThis study aims to develop an explainable machine learning model, incorporating stacking techniques, to predict the occurrence of liver injury in patients with sepsis and provide decision support for early intervention and personalized treatment strategies.MethodsThis retrospective multicenter cohort study adhered to the TRIPOD+AI (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis, Extended for Artificial Intelligence) guidelines. Data from 8834 patients with sepsis in the Medical Information Mart for Intensive Care IV (MIMIC-IV) database were used for training and internal validation, while data from 4236 patients in the eICU-Collaborative Research Database (eICU-CRD) database were used for external validation. SALI was defined as an international normalized ratio >1.5 and total bilirubin >2 mg/dL within 1 week of intensive care unit admission. Nine machine learning models—decision tree, random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), support vector machine, elastic net, logistic regression, multilayer perceptron, and k-nearest neighbors—were trained. A stacking ensemble model, using LightGBM, XGBoost, and RF as base learners and Lasso regression as the meta-model, was optimized via 10-fold cross-validation. Hyperparameters were tuned using grid search and Bayesian optimization. Model performance was evaluated using accuracy, balanced accuracy, Brier score, detection prevalence, F1-score, Jaccard index, κ coefficient, Matthews correlation coefficient, negative predictive value, positive predictive value, precision, recall, area under the receiver operating characteristic curve (ROC-AUC), precision-recall AUC, and decision curve analysis. Shapley additive explanations (SHAP) values were used to quantify feature importance.ResultsIn the training set, LightGBM, XGBoost, and RF demonstrated the best performance among all models, with ROC-AUCs of 0.9977, 0.9311, and 0.9847, respectively. These models exhibited minimal variance in cross-validation, with tightly clustered ROC-AUC and precision-recall area under the curve distributions. In the internal validation set, LightGBM (ROC-AUC 0.8401) and XGBoost (ROC-AUC 0.8403) outperformed all other models, while RF achieved an ROC-AUC of 0.8193. In the external validation set, LightGBM (ROC-AUC 0.7077), XGBoost (ROC-AUC 0.7169), and RF (ROC-AUC 0.7081) maintained strong performance, although with slight decreases in ROC-AUC compared with the training set. The stacking model achieved ROC-AUCs of 0.995, 0.838, and 0.721 in the training, internal validation, and external validation sets, respectively. Key predictors—total bilirubin, lactate, prothrombin time, and mechanical ventilation status—were consistently identified across models, with SHAP analysis highlighting their significant contributions to the model’s predictions.ConclusionsThe stacking ensemble model developed in this study yields accurate and robust predictions of SALI in patients with sepsis, demonstrating potential clinical utility for early intervention and personalized treatment strategies.

  • Research Article
  • 10.11648/j.sd.20251302.12
Construction of Depression Prediction Model Based on Machine Learning and Its Interpretability
  • Apr 14, 2025
  • Science Discovery
  • Juan Wang + 4 more

<i>Objectives:</i> The aim of this study was to construct depression prediction models based on machine learning algorithms, compared the performance of different machine learning models on depression risk prediction, and interpreted the model. <i>Methods:</i> A total of 2573 participants from the CHARLS database. LASSO and stepwise regression were used to screen for variables. The dataset is randomly divided into training set, validation set and test set according to 6:2:2. SMOTE resampling was used to balance the training set when fitted the model. Nine machine learning algorithms were used to construct the prediction model, inclpuding Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Elastic Network Regression (Enet), Support Vector Machine (SVM), Logistic Regression, Multilayer Perceptron (MLP), and K-Nearest Neighbor (KNN). The prediction ability of each machine learning classifier was evaluated on the test set according to the evaluation index, and the "optimal" model of this study was selected. Subsequently, SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) were used to analyze the interpretability of the optimal model. <i>Results</i>: The XGBoost model predicted the best performance among the 9 models. Its AUC value reached 0.908 and the clinical net benefit is the highest. The Delong test showed that there was a significant difference between the ROC curves of XGBoost and the other models (<I>P</I><0.05). The global interpretation based on SHAP showed that life satisfaction, self-rated health status, sleep duration, and cognitive score were inversely proportional to the SHAP value. Female, rural residents, body aches and pains in any area, non-retirement, and limited Instrumental Activities of Daily Living (IADL) have a positive effect on depression. The local interpretation diagram based on SHAP and LIME showed the personalized risk prediction of a single sample. <i>Conclusions:</i> Machine learning models are an effectively tool for predict the risk of depression. The use of SHapley Additive exPlanations and Local Interpretable Model-agnostic Explanations can maximize the clinical advantages of machine learning, which is helpful to predict or detect patients at high risk of depression as early as possible, and to take comprehensive evaluation and early prevention and treatment of depression.

  • Research Article
  • Cite Count Icon 15
  • 10.1016/j.ecoenv.2024.117210
Identifying cardiovascular disease risk in the U.S. population using environmental volatile organic compounds exposure: A machine learning predictive model based on the SHAP methodology
  • Oct 23, 2024
  • Ecotoxicology and Environmental Safety
  • Qingan Fu + 7 more

Identifying cardiovascular disease risk in the U.S. population using environmental volatile organic compounds exposure: A machine learning predictive model based on the SHAP methodology

  • Research Article
  • Cite Count Icon 1
  • 10.21926/aeer.2404020
Comparative Analysis of Machine Learning Models and Explainable Artificial Intelligence for Predicting Wastewater Treatment Plant Variables
  • Oct 17, 2024
  • Advances in Environmental and Engineering Research
  • Fuad Bin Nasir + 1 more

Increasing urban wastewater and rigorous discharge regulations pose significant challenges for wastewater treatment plants (WWTP) to meet regulatory compliance while minimizing operational costs. This study explores the application of several machine learning (ML) models specifically, Artificial Neural Networks (ANN), Gradient Boosting Machines (GBM), Random Forests (RF), eXtreme Gradient Boosting (XGBoost), and hybrid RF-GBM models in predicting important WWTP variables such as Biochemical Oxygen Demand (BOD), Total Suspended Solids (TSS), Ammonia (NH₃), and Phosphorus (P). Several feature selection (FS) methods were employed to identify the most influential WWTP variables. To enhance ML models’ interpretability and to understand the impact of variables on prediction, two widely used explainable artificial intelligence (XAI) methods-Local Interpretable Model-Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) were investigated in the study. Results derived from FS and XAI methods were compared to explore their reliability. The ML model performance results revealed that ANN, GBM, XGBoost, and RF-GBM have great potential for variable prediction with low error rates and strong correlation coefficients such as R<sup>2</sup> value of 1 on the training set and 0.98 on the test set. The study also revealed that XAI methods identify common influential variables in each model’s prediction. This is a novel attempt to get an overview of both LIME and SHAP explanations on ML models for a WWTP variable prediction.

  • Research Article
  • Cite Count Icon 12
  • 10.2147/jhc.s358197
A Machine Learning Model Based on Health Records for Predicting Recurrence After Microwave Ablation of Hepatocellular Carcinoma
  • Jul 28, 2022
  • Journal of Hepatocellular Carcinoma
  • Chao An + 11 more

Background and AimEarly recurrence (ER) presents a challenge for the survival prognosis of patients with hepatocellular carcinoma (HCC). The aim of this study was to investigate machine learning (ML) models using clinical data for predicting ER after microwave ablation (MWA).MethodsBetween August 2005 and December 2019, 1574 patients with early-stage HCC underwent MWA at four hospitals were reviewed. Then, 36 clinical data points per patient were collected, and the patients were assigned to the training, internal, and external validation set. Apart from traditional logistic regression (LR), three ML models—random forest, support vector machine, and eXtreme Gradient Boosting (XGBoost)—were built and validated for their predictive ability with the area under ROC curve (AUC). Algorithms such as SHapley Additive exPlanations (SHAP) and local interpretable model-agnostic explanations (LIME) were used to realize their interpretability.ResultsThe three ML models all outperformed LR (P < 0.001 for all) in predictive ability. When nine variables (tumor number, platelet, α-fetoprotein, comorbidity score, white blood cell, cholinesterase, prothrombin time, neutrophils, and etiology) were extracted simultaneously using recursive feature elimination with cross-validation, the XGBoost model achieved the best discrimination among all models, with an AUC value 0.75 (95% CI [confidence interval]: 0.72–0.78) in the training set, 0.74 (95% CI: 0.69–0.80) in the internal validation set, and 0.76 (95% CI: 0.70–0.82) in the external validation set, and it was interpreted depending on the visualization of risk factors by the SHAP and LIME algorithms. The predictive system of post-ablation recurrence risk stratification was provided on online (http://114.251.235.51:8001/) based on XGboost analysis.ConclusionThe XGBoost model based on clinical data can effectively predict ER risk after MWA, which can contribute to surveillance, prevention, and treatment strategies for HCC.

  • Research Article
  • 10.3389/fmed.2025.1638097
Machine learning-based predictive model for acute pancreatitis-associated lung injury: a retrospective analysis
  • Aug 12, 2025
  • Frontiers in Medicine
  • Zhaohui Du + 9 more

BackgroundAcute Pancreatitis-Associated Lung Injury (APALI) is one of the most severe and life-threatening systemic complications in acute pancreatitis patients, with high rates of morbidity and mortality. This study aims to develop a prediction model for the diagnosis of APALI based on machine learning algorithms.MethodsThis study included data from the First Affiliated Hospital of Bengbu Medical College (July 2012 to June 2022), which were randomly categorized into the training and testing set. And data from the Second Affiliated Hospital of Zhejiang University (January 2018 to April 2023) served as the external validation set. LASSO regression was applied to eliminate irrelevant or highly collinear independent variables. Six machine learning models were constructed, with evaluation metrics including Area Under Curve (AUC), accuracy, sensitivity, specificity, F1 score, and recall. The impact of model features was analyzed using SHapley Additive exPlanations (SHAP).ResultsA total of 1,975 patients with acute pancreatitis were randomly assigned to a training set (1,480 patients) and a testing set (495 patients). In the training set, 480 cases (32.43%) were diagnosed with APALI. The eXtreme Gradient Boosting (XGBoost) and Random Forest (RF) models demonstrated the best predictive performance, achieving the highest AUC (0.92 and 0.914, respectively), along with higher accuracy, F1 score, and recall in the testing set. Six particularly influential factors were identified and ranked as follows: CRP, BMI, neutrophil, calcium, lactate, and neutrophil-to-albumin ratio (NAR). The global interpretability of the XGBoost and RF models, along with these six features, is shown in the SHAP summary plot. These two models were selected as the optimal models for the development of an online calculator for clinical applications and risk stratification.ConclusionWe developed and internally validated a machine learning model to predict APALI, showing strong performance in our study population. To support further research and clinical use, we created an open-access web-based risk calculator. Prospective multicenter validation is needed to confirm generalizability. If successful, the tool may support early risk identification and guide interventions to prevent APALI.

  • Research Article
  • 10.1007/s40122-025-00802-x
Machine Learning-Based Prediction for Axial Pain Following Expansive Unilateral Open-Door Laminoplasty: A Retrospective Cohort Study.
  • Dec 10, 2025
  • Pain and therapy
  • Kelun Huang + 7 more

Axial pain is a common complication following expansive unilateral open-door laminoplasty (ELAP); however, traditional statistical methods are unable to effectively predict this complication. This study developed machine learning (ML) models to predict post-ELAP axial pain and identify key predictors. This retrospective study enrolled 851 cervical spondylotic myelopathy (CSM) patients undergoing ELAP, split into training (n = 714) and temporal validation sets (n = 137). Demographic, imaging, clinical, and surgical data were collected. Predictive features were selected by the least absolute shrinkage and selection operator (Lasso) regression, followed by ML model development with grid search optimizing hyperparameters. The top-performing model underwent temporal validation, and SHapley Additive exPlanations (SHAP) analyzed predictor contributions. The training set included 218 axial pain cases; the test set had 47. Key predictors (C7 laminoplasty, cervical kyphosis, etc.) were identified to develop ML model. Post-optimization, extreme gradient boosting (XGBoost) achieved superior performance (internal validation area under the receiver [AUC] = 0.948; 95% confidence interval [CI] 0.918-0.978), maintained in temporal validation (AUC = 0.906; 95% CI 0.858-0.954). Through SHAP analysis, the predictors were ranked in descending order of importance as follows: C7 laminoplasty, quantity-based surgical segment classification, cervical kyphosis, angle of lamina open-door, cervical lordosis, and occupying rate of cervical spinal canal. ML models coupled with SHAP analysis effectively predict post-ELAP axial pain, identifying the key predictors. Performing segment-selective ELAP, avoiding unnecessary C7 laminoplasty, and maintaining optimal open-door angle are critical factors in avoiding the occurrence of axial pain following ELAP.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.2196/54872
Development and Validation of an Explainable Machine Learning Model for Predicting Myocardial Injury After Noncardiac Surgery in Two Centers in China: Retrospective Study
  • Jul 26, 2024
  • JMIR Aging
  • Chang Liu + 9 more

BackgroundMyocardial injury after noncardiac surgery (MINS) is an easily overlooked complication but closely related to postoperative cardiovascular adverse outcomes; therefore, the early diagnosis and prediction are particularly important.ObjectiveWe aimed to develop and validate an explainable machine learning (ML) model for predicting MINS among older patients undergoing noncardiac surgery.MethodsThe retrospective cohort study included older patients who had noncardiac surgery from 1 northern center and 1 southern center in China. The data sets from center 1 were divided into a training set and an internal validation set. The data set from center 2 was used as an external validation set. Before modeling, the least absolute shrinkage and selection operator and recursive feature elimination methods were used to reduce dimensions of data and select key features from all variables. Prediction models were developed based on the extracted features using several ML algorithms, including category boosting, random forest, logistic regression, naïve Bayes, light gradient boosting machine, extreme gradient boosting, support vector machine, and decision tree. Prediction performance was assessed by the area under the receiver operating characteristic (AUROC) curve as the main evaluation metric to select the best algorithms. The model performance was verified by internal and external validation data sets with the best algorithm and compared to the Revised Cardiac Risk Index. The Shapley Additive Explanations (SHAP) method was applied to calculate values for each feature, representing the contribution to the predicted risk of complication, and generate personalized explanations.ResultsA total of 19,463 eligible patients were included; among those, 12,464 patients in center 1 were included as the training set; 4754 patients in center 1 were included as the internal validation set; and 2245 in center 2 were included as the external validation set. The best-performing model for prediction was the CatBoost algorithm, achieving the highest AUROC of 0.805 (95% CI 0.778‐0.831) in the training set, validating with an AUROC of 0.780 in the internal validation set and 0.70 in external validation set. Additionally, CatBoost demonstrated superior performance compared to the Revised Cardiac Risk Index (AUROC 0.636; P<.001). The SHAP values indicated the ranking of the level of importance of each variable, with preoperative serum creatinine concentration, red blood cell distribution width, and age accounting for the top three. The results from the SHAP method can predict events with positive values or nonevents with negative values, providing an explicit explanation of individualized risk predictions.ConclusionsThe ML models can provide a personalized and fairly accurate risk prediction of MINS, and the explainable perspective can help identify potentially modifiable sources of risk at the patient level.

  • Research Article
  • Cite Count Icon 4
  • 10.1186/s12884-024-06980-4
Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China
  • Dec 4, 2024
  • BMC Pregnancy and Childbirth
  • Liwen Ding + 8 more

BackgroundPreterm birth (PTB) is a significant cause of neonatal mortality and long-term health issues. Accurate prediction and timely prevention of PTB are essential for reducing associated child mortality and morbidity. Traditional predictive methods face challenges due to heterogeneous risk factors and their interaction effects. This study aims to develop and evaluate six machine learning (ML) models to predict PTB using large-scale children survey data from Shenzhen, China, and to identify key predictors through Shapley Additive Explanations (SHAP) analysis.MethodsData from 84,050 mother–child pairs, collected in 2021 and 2022, were processed and divided into training, validation, and test sets. Six ML models were tested: L1-Regularised Logistic Regression, Light Gradient Boosting Machine (LightGBM), Naive Bayes, Random Forests, Support Vector Machine, and Extreme Gradient Boosting (XGBoost). Model performance was evaluated based on discrimination, calibration and clinical utility. SHAP analysis was used to interpret the importance and impact of individual features on PTB prediction.ResultsThe XGBoost model demonstrated the best overall performance, with the area under the receiver operating characteristic curve (AUC) scores of 0.752 and 0.757 in the validation and test sets, respectively, along with favorable calibration and clinical utility. Key predictors identified were multiple pregnancies, threatened abortion, and maternal age of conception. SHAP analysis highlighted the positive impacts of multiple pregnancies and threatened abortion, as well as the negative impact of micronutrient supplementation on PTB.ConclusionOur study found that ML models, particularly XGBoost, show promise in accurately predicting PTB and identifying key risk factors. These findings provide the potential of ML for enhancing clinical interventions, personalizing prenatal care, and informing public health initiatives.

  • Research Article
  • 10.1186/s40001-025-03742-6
Development and validation of a machine learning-driven framework for differentiating pediatric bronchopneumonia from lobar pneumonia: a multicenter investigation.
  • Dec 23, 2025
  • European journal of medical research
  • Zihan Cai + 2 more

This investigation aims to establish and substantiate a machine learning-driven predictive framework designed to precisely distinguish between pediatric bronchopneumonia and lobar pneumonia. This endeavor seeks to elevate the accuracy of early clinical support, refine treatment decision-making, and curtail superfluous medical interventions. This study was executed at Siyang Hospital, enrolling 2304 pediatric patients diagnosed with either bronchopneumonia or lobar pneumonia from January 2020 to December 2024. Participants were randomized in a 7:3 ratio into training (n = 1612) and testing (n = 692) sets, supplemented by an external validation set (n = 454) to evaluate the model's generalizability. Hematological and serum biochemical parameters were gathered, with feature selection conducted using eXtreme Gradient Boosting (XGBoost), Support Vector Machine-Recursive Feature Elimination (SVM-RFE), and Random Forest algorithms. A suite of twelve machine learning models-including Random Forest, Gradient Boosting, and Support Vector Machines-was developed, with parameters fine-tuned through five-fold cross-validation. Model efficacy was assessed via receiver operating characteristic (ROC) curves, area under the curve (AUC), sensitivity, specificity, and F1 score, while feature significance was quantified using SHAP values. A nomogram was formulated based on critical features, its clinical value affirmed through calibration curves, and decision curve analysis (DCA). Statistical evaluations incorporated Mann-Whitney U tests, chi-square tests, and DeLong tests, with a threshold of P < 0.05 denoting significance. Notable disparities emerged between the bronchopneumonia (n = 1868) and lobar pneumonia (n = 436) cohorts across several hematological markers, such as large platelet count (P-LCT), Lymphocyte percentage (LYM%), and creatinine (CREA) (P < 0.01). Feature selection pinpointed P-LCT, LYM%, and CREA as key predictors. The Gradient Boosting model demonstrated exemplary performance, yielding an AUC of 0.947 (95% CI 0.934-0.960) in the training set, 0.968 (95% CI 0.954-0.982) in the testing set, and 0.989 (95% CI 0.981-0.997) in the external validation set, underscoring its outstanding discriminative prowess and robust generalizability. SHapley Additive exPlanations (SHAP) analysis underscored P-LCT (Mean Absolute SHAP: 0.057) and LYM% (0.065) as predominant predictors, exhibiting a strong correlation with disease severity. The nomogram attained an AUC of 0.962, with impeccable calibration (C-index = 0.962), and DCA substantiated considerable net benefit at moderate risk thresholds. The Gradient Boosting model, as delineated in this study, markedly advances the differential diagnosis of pediatric bronchopneumonia and lobar pneumonia, delivering high precision and resilience. It serves as an efficacious and dependable clinical decision-support instrument. By incorporating pivotal biomarkers like P-LCT and LYM%, this model illuminates pathophysiological traits, enhances antibiotic stewardship, and guides hospitalization choices, thereby diminishing healthcare resource wastage and ameliorating patient outcomes. These insights furnish vital backing for precision medicine and acute care management.

  • Research Article
  • 10.3389/fmed.2025.1559613
Predictive value of the stone-free rate after percutaneous nephrolithotomy based on multiple machine learning models
  • Aug 19, 2025
  • Frontiers in Medicine
  • Zhao Rong Liu + 3 more

PurposeThis study aimed to develop three types of machine learning (ML) models based on gradient boosting decision tree (GBDT), random forest (RF), and extreme gradient boosting (XGBoost) to explore their predictive value for the stone-free rate after percutaneous nephrolithotomy (PCNL).Patients and methodsA retrospective analysis was conducted on 160 patients who underwent PCNL. The patients were randomly divided into a training set and a test set in a 7:3 ratio. Clinical data were collected, and univariate analysis was performed to identify important data significantly associated with the stone-free rate after PCNL. Three ML models (GBDT, RF, and XGBoost) were developed using the training set. The predictive performance of these models was evaluated using the area under the curve (AUC) of the receiver operating characteristic (ROC) on the test set, confusion matrix, specificity, sensitivity, accuracy, and F1 score. For the top-performing prediction model, the study further employed the SHapley Additive exPlanations (SHAP) method to enhance model interpretability by elucidating the contribution of individual features to the prediction outcomes and ranking the relative importance of the predictive data. Finally, the clinical utility of the model was assessed through decision curve analysis (DCA), which quantified the net clinical benefit of applying the model across various risk thresholds.ResultsPostoperative statistics indicated a stone-free rate of 70.6% (n = 113) among the patients. The data significantly associated with the absence of residual stones included the number of stones, stone diameter, stone CT value, history of previous stone surgery, stone location, and stone shape (p < 0.05). All three models demonstrated strong predictive effects in the validation set, with the GBDT model showing superior performance [AUC: 0.836 (95% CI: 0.785–0.873); accuracy: 0.854; sensitivity: 0.853; specificity: 0.857] compared to the XGBoost [AUC: 0.830 (95% CI: 0.792–0.868); accuracy: 0.771; sensitivity: 0.824; specificity: 0.643] and RF models [AUC: 0.803 (95% CI: 0.763–0.837); accuracy: 0.792; sensitivity: 0.824; specificity: 0.714]. The F1 scores for the GBDT, RF, and XGBoost models were 0.892, 0.836, and 0.849, respectively. The DCA decision curve analysis confirmed that the GBDT model offers a favorable net clinical benefit. In addition, the SHAP analysis identified the number of stones and the stone CT value as the most critical features influencing the model’s predictions, contributing significantly to its overall predictive performance.ConclusionThe prediction models developed based on three machine learning algorithms can accurately predict the stone-free rate after PCNL in patients with urinary tract stones. Among these, the GBDT model can effectively identify patients who are most likely to achieve successful outcomes from PCNL based on demographic and stone characteristics, thereby assisting in clinical treatment decision-making.

  • Research Article
  • Cite Count Icon 85
  • 10.1016/j.conbuildmat.2022.129227
Explainable machine learning models for predicting the axial compression capacity of concrete filled steel tubular columns
  • Oct 2, 2022
  • Construction and Building Materials
  • Celal Cakiroglu + 4 more

Explainable machine learning models for predicting the axial compression capacity of concrete filled steel tubular columns

  • Research Article
  • Cite Count Icon 8
  • 10.1016/j.ejps.2023.106506
Prediction of plasma trough concentration of voriconazole in adult patients using machine learning
  • Jun 24, 2023
  • European Journal of Pharmaceutical Sciences
  • Lin Cheng + 7 more

Prediction of plasma trough concentration of voriconazole in adult patients using machine learning

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.