Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Building and validating machine learning models to predict appendiceal perforation during conservative treatment of fecalith-associated appendicitis: a 20-algorithm multicenter retrospective analysis.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Fecalith-associated appendicitis presents unique challenges in conservative management due to increased perforation risk. Early identification of patients at high risk for appendiceal perforation during conservative treatment remains crucial for optimal clinical decision-making. To develop and validate machine learning-based prediction models for appendiceal perforation risk assessment during conservative treatment of fecalith-associated appendicitis. This retrospective cohort study analyzed 1247 patients with fecalith-associated appendicitis who underwent initial conservative treatment across four tertiary care centers between January 2018 and December 2023. The study design and reporting adhere to the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) guidelines for model development and external validation. Twenty machine learning algorithms were systematically trained and validated using clinical, laboratory, and imaging parameters. LASSO regularization identified eight optimal predictive features from 34 candidate variables. The final nominated model for clinical deployment is the Gradient Boosting classifier, trained on all eight LASSO-selected features. Primary outcome was appendiceal perforation within 72h of conservative treatment initiation. Of 1247 patients, 186 (14.9%) developed appendiceal perforation during conservative treatment. The ensemble gradient boosting model achieved the highest performance with an AUC of 0.892 (95% CI 0.871-0.913), sensitivity of 84.4% (95% CI 79.2-89.6%), and specificity of 81.7% (95% CI 77.8-85.6%). External validation in an independent cohort (n = 225; The People's Hospital of Sishui, January 2023-December 2024) confirmed model generalisability: AUC = 0.909 (95% CI 0.859-0.951), sensitivity = 73.7%, specificity = 93.0%, PPV = 68.3%, and NPV = 94.6%. SHAP analysis identified key predictive features: fecalith size (importance: 0.234), C-reactive protein (0.186), white blood cell count (0.162), appendiceal wall thickness (0.143), and patient age (0.121). Risk stratification classified patients into low-risk (58.9%, 3.8% perforation rate), moderate-risk (31.9%, 24.6% perforation rate), and high-risk (9.2%, 71.3% perforation rate) categories. Decision curve analysis demonstrated significant clinical utility with net benefit of 0.08 at 15% threshold probability. Machine learning models, particularly ensemble gradient boosting methods, demonstrate excellent accuracy in predicting appendiceal perforation risk during conservative treatment of fecalith-associated appendicitis, with performance confirmed in an external validation cohort. These validated models provide clinically actionable risk stratification that may assist in treatment decision-making and patient monitoring strategies, potentially preventing unnecessary surgeries while identifying high-risk patients requiring enhanced surveillance or early surgical intervention.

Similar Papers
  • Research Article
  • Cite Count Icon 6
  • 10.5144/0256-4947.2003.187
Acute Appendicitis in Infants: Still a Diagnostic Dilemma
  • May 1, 2003
  • Annals of Saudi Medicine
  • Mathew Punnachalil Cherian + 2 more

Acute Appendicitis in Infants: Still a Diagnostic Dilemma

  • Research Article
  • Cite Count Icon 1
  • 10.1177/10760296251372942
Development and External Validation of a Machine Learning Model to Predict Venous Thromboembolism Risk in Hospitalized Chinese Patients
  • Aug 26, 2025
  • Clinical and Applied Thrombosis/Hemostasis
  • Xiaolan Chen + 8 more

Objectives To develop and externally validate a machine learning (ML) model of VTE in hospitalized Chinese patients. Methods We retrospectively reviewed structured data from patients with and without VTE (N = 1126 in each group) at Beijing Shijitan Hospital between January 2012 and December 2021. ML algorithms, including logistic regression (LR), decision tree (DT), or gradient boosting (GBoost) were used to establish prediction models. Patients were prospectively enrolled at Beijing Shijitan Hospital (N = 2916) or Beijing Chaoyang Hospital (N = 1339) for internal and external model validation, respectively. Results Several clinical features were had high weighted correlation with VTE in ML models: age, D-dimer level, platelet count, hemoglobin level, coronary heart disease, cancer, male sex, and comorbidity. The two highest weighted features were age and D-dimer. Optimal cut-off values indicated that age >65 years, white blood cell count >8.69 (10 9 /L), hemoglobin level >126 (g/L), platelet count >197 (10 9 /L), neutrophil percentage >72%, and D-dimer level >965 (ng/ml, D-dimer units) were significantly correlated with VTE occurrence. The AUC value of the GBoost model (0.88 ± 0.03) was significantly higher than the DT (0.70 ± 0.03) or LR (0.64 ± 0.03) models. GBoost also had a higher sensitivity, specificity, Youden index and Matthews correlation coefficient (MCC) than DT or LR (P < 0.05). In the internal validation cohort, the AUC value of the GBoost model (0.85 ± 0.03) was higher than the DT (0.76 ± 0.03) and LR (0.67 ± 0.03) models and the Caprini Risk Assessment Model (RAM) (0.75 ± 0.03). GBoost also had a greater sensitivity, specificity, Youden index and MCC than DT and LR (P < 0.05). In the external validation cohort, the AUC value of the GBoost model (0.81 ± 0.03) was also better than the DT (0.72 ± 0.03) and LR (0.71 ± 0.03) models and the Caprini RAM (0.73 ± 0.03). The GBoost model also had a higher sensitivity, specificity, Youden index and MCC than DT and LR (P < 0.05). Conclusion The GBoost ML model was better at identifying hospitalized Chinese patients at risk of VTE than other ML algorithms and the Caprini RAM.

  • Front Matter
  • Cite Count Icon 187
  • 10.1016/j.gie.2019.07.033
ASGE review of adverse events in colonoscopy
  • Sep 25, 2019
  • Gastrointestinal Endoscopy
  • Shivangi T Kothari + 19 more

ASGE review of adverse events in colonoscopy

  • Discussion
  • Cite Count Icon 12
  • 10.1053/j.gastro.2006.10.060
Making sense of CT colonography–related complication rates
  • Dec 1, 2006
  • Gastroenterology
  • P.J Limburg + 1 more

Making sense of CT colonography–related complication rates

  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.bja.2024.01.030
Multicentre validation of a machine learning model for predicting respiratory failure after noncardiac surgery
  • Feb 26, 2024
  • British Journal of Anaesthesia
  • Hyun-Kyu Yoon + 7 more

Multicentre validation of a machine learning model for predicting respiratory failure after noncardiac surgery

  • Research Article
  • Cite Count Icon 10
  • 10.1046/j.1440-1622.1999.01482.x
Insurance and the risk of ruptured appendix in the adult.
  • Jan 1, 1999
  • The Australian and New Zealand journal of surgery
  • S W Wong + 4 more

Disparities in medical care related to the insurance status of patients have been reported. A retrospective analysis was performed to examine the insurance-related differences in the risk of appendiceal perforation in the Prince of Wales Hospital (POWH), New South Wales. Computerized data of 1179 patient years who had a diagnosis of appendicitis and were admitted to the POWH over the preceding 10 years were examined. The outcome measure was appendiceal perforation. Patient variables examined were insurance status, sex, age, and socio-economic status (SES). Three hundred patients over the same period were identified who had an appendicectomy but not appendicitis. Multiple logistic regression and Fisher's exact test were used for statistical analysis. The overall perforation rate in 1179 patients was 17%. The only factor that was related to an increased risk of perforation was age over 50 years (odds ratio (OR) 1.57; 95% confidence interval (CI) 1.04-2.53). Sex, insurance status or SES were not associated with a higher risk of perforation. The overall rate of negative appendicectomy was 20% (300 of 1479 patients), and the rate was higher in the uninsured patients (22 vs 17%, P = 0.014, Fisher's exact test). Lack of health insurance was not associated with an increased incidence of appendiceal perforation at the POWH. Age over 50 years was identified as the only risk factor for appendiceal perforation. The lower negative appendicectomy rate in the insured group may be because of better diagnostic ability of consultants compared to registrars.

  • Research Article
  • Cite Count Icon 32
  • 10.1016/s0140-6736(23)01311-9
Role of preoperative in-hospital delay on appendiceal perforation while awaiting appendicectomy (PERFECT): a Nordic, pragmatic, open-label, multicentre, non-inferiority, randomised controlled trial
  • Sep 14, 2023
  • Lancet (London, England)
  • Karoliina Jalava + 7 more

Role of preoperative in-hospital delay on appendiceal perforation while awaiting appendicectomy (PERFECT): a Nordic, pragmatic, open-label, multicentre, non-inferiority, randomised controlled trial

  • Research Article
  • 10.4103/bc.bc_113_25
Explainable machine learning versus logistic regression for outcome prediction in primary intracerebral hemorrhage: A multicenter radiomics study
  • Feb 23, 2026
  • Brain Circulation
  • Huan Wang + 9 more

Abstract: CONTEXT: Accurate outcome prediction is essential for clinical decisions in intracerebral hemorrhage (ICH) patients. However, whether machine learning (ML) models outperform traditional logistic regression (LR) remains unclear. AIMS: This study aims to compare six ML algorithms with LR in predicting poor 3-month outcomes after primary ICH, using radiomics features from noncontrast computed tomography and clinical data. SETTINGS AND DESIGN: A retrospective study. SUBJECTS AND METHODS: Seven hundred and four primary ICH patients from two centers were allocated into training ( n = 516), internal ( n = 128), and external validation ( n = 60) cohorts. Radiomics features from hematoma regions were extracted to generate a radiomics score (Rad-score). STATISTICAL ANALYSIS USED: The Rad-score and clinical variables were selected for developing one LR and six ML models: random forest (RF), artificial neural network (ANN), AdaBoostM1, Naive Bayes (NB), XGB, and support vector machine (SVM). Model discrimination was assessed by the area under the curve (AUC), and the best-performing ML model was interpreted using Shapley Additive exPlanations (SHAP). RESULTS: In the training cohort, AUCs were 0.849 for LR, 0.897 for RF, 0.885 for XGB, 0.884 for AdaBoostM1, 0.858 for ANN, 0.848 for NB, and 0.839 for SVM. In the internal and external validation cohorts, AUCs ranged from 0.796–0.823 and 0.806–0.858, respectively. The RF model achieved significantly higher AUCs than LR in both training and external validation sets (both P < 0.05). SHAP plots identified Rad-score and National Institutes of Health Stroke Scale as key predictors. CONCLUSIONS: The RF model, integrating radiomic and clinical data, outperformed LR and showed the highest accuracy in predicting poor 3-month outcomes after primary ICH.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 25
  • 10.1007/s10072-022-06351-x
Interpretable machine learning model to predict rupture of small intracranial aneurysms and facilitate clinical decision.
  • Aug 23, 2022
  • Neurological Sciences
  • Weigen Xiong + 14 more

Estimating whether to treat the rupture risk of small intracranial aneurysms (IAs) with size ≤ 7mm in diameter is difficult but crucial. We aimed to construct and externally validate a convenient machine learning (ML) model for assessing the rupture risk of small IAs. One thousand four patients with small IAs recruited from two hospitals were included in our retrospective research. The patients at hospital 1 were stratified into training (70%) and internal validation set (30%) randomly, and the patients at hospital 2 were used for external validation. We selected predictive features using the least absolute shrinkage and selection operator (LASSO) method and constructed five ML models applying diverse algorithms including random forest classifier (RFC), categorical boosting (CatBoost), support vector machine (SVM) with linear kernel, light gradient boosting machine (LightGBM), and extreme gradient boosting (XGBoost). The Shapley Additive Explanations (SHAP) analysis provided interpretation for the best ML model. The training, internal, and external validation cohorts included 658, 282, and 64 IAs, respectively. The best performance was presented by SVM as AUC of 0.817 in the internal [95% confidence interval (CI), 0.769-0.866] and 0.893 in the external (95% CI, 0.808-0.979) validation cohorts, which overperformed compared with the PHASES score significantly (all P < 0.001). SHAP analysis showed maximum size, location, and irregular shape were the top three important features to predict rupture. Our SVM model based on readily accessible features presented satisfying ability of discrimination in predicting the rupture IAs with small size. Morphological parameters made important contributions to prediction result.

  • Research Article
  • Cite Count Icon 20
  • 10.1016/j.eclinm.2024.102772
Development and validation of a deep learning-based framework for automated lung CT segmentation and acute respiratory distress syndrome prediction: a multicenter cohort study
  • Jul 26, 2024
  • eClinicalMedicine
  • Yang Zhou + 14 more

Development and validation of a deep learning-based framework for automated lung CT segmentation and acute respiratory distress syndrome prediction: a multicenter cohort study

  • Research Article
  • Cite Count Icon 1
  • 10.1177/1742271x16689693
Evaluating the risk of appendiceal perforation when using ultrasound as the initial diagnostic imaging modality in children with suspected appendicitis.
  • Jan 29, 2017
  • Ultrasound
  • Stephen Alerhand + 2 more

Ultrasound scan has gained attention for diagnosing appendicitis due to its avoidance of ionizing radiation. However, studies show that ultrasound scan carries inferior sensitivity to computed tomography scan. A non-diagnostic ultrasound scan could increase the time to diagnosis and appendicectomy, particularly if follow-up computed tomography scan is needed. Some studies suggest that delaying appendicectomy increases the risk of perforation. To investigate the risk of appendiceal perforation when using ultrasound scan as the initial diagnostic imaging modality in children with suspected appendicitis. We retrospectively reviewed 1411 charts of children ≤17 years old diagnosed with appendicitis at two urban academic medical centers. Patients who underwent ultrasound scan first were compared to those who underwent computed tomography scan first. In the sub-group analysis, patients who only received ultrasound scan were compared to those who received initial ultrasound scan followed by computed tomography scan. Main outcome measures were appendiceal perforation rate and time from triage to appendicectomy. In 720 children eligible for analysis, there was no significant difference in perforation rate between those who had initial ultrasound scan and those who had initial computed tomography scan (7.3% vs. 8.9%, p = 0.44), nor in those who had ultrasound scan only and those who had initial ultrasound scan followed by computed tomography scan (8.0% vs. 5.6%, p = 0.42). Those patients who had ultrasound scan first had a shorter triage-to-incision time than those who had computed tomography scan first (9.2 (IQR: 5.9, 14.0) vs. 10.2 (IQR: 7.3, 14.3) hours, p = 0.03), whereas those who had ultrasound scan followed by computed tomography scan took longer than those who had ultrasound scan only (7.8 (IQR: 5.3, 11.6) vs. 15.1 (IQR: 10.6, 20.6), p < 0.001). Children < 12 years old receiving ultrasound scan first had lower perforation rate (p = 0.01) and shorter triage-to-incision time (p = 0.003). Children with suspected appendicitis receiving ultrasound scan as the initial diagnostic imaging modality do not have increased risk of perforation compared to those receiving computed tomography scan first. We recommend that children <12 years of age receive ultrasound scan first.

  • Research Article
  • Cite Count Icon 58
  • 10.1001/archsurg.2010.328
Effect of Race and Socioeconomic Status in the Treatment of Appendicitis in Patients With Equal Health Care Access
  • Feb 1, 2011
  • Archives of Surgery
  • Steven L Lee

Lower socioeconomic and minority racial/ethnic status have been linked to delays in surgical care and thus higher appendiceal perforation rates. Equal access to health care eliminates the previously reported socioeconomic and racial/ethnic disparities in rates of appendiceal perforation. Retrospective cohort study using discharge abstract data and US census data. Twelve regional Kaiser Permanente hospitals in southern California. A total of 16,156 patients treated for appendicitis. Patients were divided into low, medium, and high groups based on annual household income and educational level, as well as racial/ethnic status (white, black, Hispanic, and Asian). Appendiceal perforation (AP) rate and length of hospitalization (LOH). The adjusted odds ratio for AP was lower in Hispanics and similar in blacks and Asians compared with whites. The odds ratio for AP was similar in high- and medium-income families compared with low-income families. The odds ratio for AP was higher in patients with high educational levels and similar in those with medium educational levels compared with low educational levels. The adjusted LOH was longer in blacks, shorter in Hispanics, and similar in Asians compared with whites. The LOH was similar in high- and medium-income families compared with low-income families. The LOH was higher in patients with medium educational levels and similar in those with high educational levels compared with low educational levels. Lower socioeconomic background and minority race/ethnicity did not correlate with higher AP rates or a clinically longer LOH in patients with equal access to care. Based on these findings, we believe that equal health care access leads to equivalent outcomes in all patients with appendicitis.

  • Research Article
  • 10.1097/01.ju.0001008696.31772.28.02
MP49-02 DEVELOPMENT AND EXTERNAL VALIDATION OF INTERPRETABLE MACHINE LEARNING MODELS FOR CLINICALLY SIGNIFICANT PROSTATE CANCER DIAGNOSIS IN PATIENTS WITH LESIONS OF PI-RADS V2.1 SCORE ≥3
  • May 1, 2024
  • The Journal of Urology
  • Mingjian Ruan + 5 more

MP49-02 DEVELOPMENT AND EXTERNAL VALIDATION OF INTERPRETABLE MACHINE LEARNING MODELS FOR CLINICALLY SIGNIFICANT PROSTATE CANCER DIAGNOSIS IN PATIENTS WITH LESIONS OF PI-RADS V2.1 SCORE ≥3

  • Research Article
  • Cite Count Icon 44
  • 10.1016/j.jpeds.2015.11.075
The Impact of Socioeconomic Status on Appendiceal Perforation in Pediatric Appendicitis
  • Dec 28, 2015
  • The Journal of Pediatrics
  • Luke R Putnam + 5 more

The Impact of Socioeconomic Status on Appendiceal Perforation in Pediatric Appendicitis

  • Research Article
  • 10.1007/s00586-026-09908-y
Development and validation of a nomogram for differential diagnosis of pyogenic spondylitis and tuberculous spondylitis in China: a multicenter retrospective study.
  • May 5, 2026
  • European spine journal : official publication of the European Spine Society, the European Spinal Deformity Society, and the European Section of the Cervical Spine Research Society
  • Liang Xu + 22 more

Pyogenic spondylitis (PS) and tuberculous spondylitis (TS) present with significant clinical overlap, posing a major diagnostic challenge. We aimed to develop and validate an imaging-based nomogram integrating CT and MRI features to accurately differentiate PS from TS. We conducted a multicenter retrospective study including 539 patients with spinal infections (251 PS, 288 TS) diagnosed between June 2021 and May 2025. Patients were divided into training (n = 427) and external validation (n = 112) cohorts. Imaging features were screened using univariate logistic regression. The least absolute shrinkage and selection operator (LASSO) regression was then applied to select the optimal predictive feature subset and mitigate overfitting. A multivariate logistic regression model based on these features constructed the nomogram. We evaluated diagnostic performance using the area under receiver operating characteristic curve (AUC), calibration curves, and decision curve analysis (DCA). Internal validation employed 500 bootstrap resamples; external validation used an independent cohort. The training cohort comprised 187 (43.8%) PS and 240 (56.2%) TS patients; the external validation cohort had 64 (57.1%) PS and 48 (42.9%) TS patients. LASSO regression identified five key predictors: vertebral involvement pattern (continuous vs. skip/non-continuous), vertebral body T2-weighted signal intensity (hyperintense vs. heterogeneous), MRI abscess wall characteristics (thick/irregular vs. thin/smooth), CT bone destruction type (osteolytic vs. fragmentary), and CT sagittal bone destruction degree (< 1/3 vs. > 2/3). The AUCs of the nomograms for the training and external validation cohorts were 0.908 (95% confidence interval: 0.880-0.936) and 0.899 (95% confidence interval: 0.842-0.955), respectively. Calibration curves showed the optimal concordance between predicted results and the actual observations. DCA indicated that the substantial clinical net benefit across threshold probabilities. The developed nomogram is capable of accurately distinguishing between PS and TS, thereby aiding clinicians in making informed decisions promptly upon obtaining relevant data.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant