Identification of Risk Factors and Development of Machine Learning Prediction Models for Inpatient Calcium Pyrophosphate Deposition Disease Flares.
The risk factors for inpatient calcium pyrophosphate deposition disease (CPPD) flares remain poorly defined. This study aimed to identify independent risk factors for inpatient CPPD flares and develop exploratory predictive models to support early recognition and management. A retrospective case-control study was conducted at a tertiary care hospital from January 2015 to December 2022. Adults aged ≥18 years with confirmed inpatient CPPD flares were matched to controls admitted on the same date and unit without a CPPD flare during hospitalization. Univariate and multivariate logistic regression analyses were used to identify independent risk factors. Exploratory predictive models were developed using decision tree, random forest (RF), logistic regression, and extreme gradient boosting (XGBoost) algorithms. Model performance was evaluated using precision, recall, F1-score, accuracy, and area under the receiver operating characteristic curve. A total of 324 hospitalized patients (162 with CPPD flares and 162 controls) were included. Multivariate analysis demonstrated that advanced age (OR: 1.08; 95% CI: 1.06-1.11; p<0.01), female sex (OR: 1.80; 95% CI: 1.05-3.07; p=0.03), and in-hospital antibiotic use (OR: 1.98; 95% CI: 1.08-3.64; p=0.03) were independent predictors of inpatient CPPD flares. Among the predictive models, the RF model achieved the highest accuracy (0.85) and demonstrated strong discriminative performance (AUROC = 0.89). Advanced age, female sex, and in-hospital antibiotic therapy independently increased the risk of inpatient CPPD flares. The RF model provides a promising proof of concept but requires external validation.
- Research Article
200
- 10.1001/jamanetworkopen.2019.15997
- Oct 25, 2019
- JAMA Network Open
Machine learning algorithms could identify patients with cancer who are at risk of short-term mortality. However, it is unclear how different machine learning algorithms compare and whether they could prompt clinicians to have timely conversations about treatment and end-of-life preferences. To develop, validate, and compare machine learning algorithms that use structured electronic health record data before a clinic visit to predict mortality among patients with cancer. Cohort study of 26 525 adult patients who had outpatient oncology or hematology/oncology encounters at a large academic cancer center and 10 affiliated community practices between February 1, 2016, and July 1, 2016. Patients were not required to receive cancer-directed treatment. Patients were observed for up to 500 days after the encounter. Data analysis took place between October 1, 2018, and September 1, 2019. Logistic regression, gradient boosting, and random forest algorithms. Primary outcome was 180-day mortality from the index encounter; secondary outcome was 500-day mortality from the index encounter. Among 26 525 patients in the analysis, 1065 (4.0%) died within 180 days of the index encounter. Among those who died, the mean age was 67.3 (95% CI, 66.5-68.0) years, and 500 (47.0%) were women. Among those who were alive at 180 days, the mean age was 61.3 (95% CI, 61.1-61.5) years, and 15 922 (62.5%) were women. The population was randomly partitioned into training (18 567 [70.0%]) and validation (7958 [30.0%]) cohorts at the patient level, and a randomly selected encounter was included in either the training or validation set. At a prespecified alert rate of 0.02, positive predictive values were higher for the random forest (51.3%) and gradient boosting (49.4%) algorithms compared with the logistic regression algorithm (44.7%). There was no significant difference in discrimination among the random forest (area under the receiver operating characteristic curve [AUC], 0.88; 95% CI, 0.86-0.89), gradient boosting (AUC, 0.87; 95% CI, 0.85-0.89), and logistic regression (AUC, 0.86; 95% CI, 0.84-0.88) models (P for comparison = .02). In the random forest model, observed 180-day mortality was 51.3% (95% CI, 43.6%-58.8%) in the high-risk group vs 3.4% (95% CI, 3.0%-3.8%) in the low-risk group; at 500 days, observed mortality was 64.4% (95% CI, 56.7%-71.4%) in the high-risk group and 7.6% (7.0%-8.2%) in the low-risk group. In a survey of 15 oncology clinicians with a 52.1% response rate, 100 of 171 patients (58.8%) who had been flagged as having high risk by the gradient boosting algorithm were deemed appropriate for a conversation about treatment and end-of-life preferences in the upcoming week. In this cohort study, machine learning algorithms based on structured electronic health record data accurately identified patients with cancer at risk of short-term mortality. When the gradient boosting algorithm was applied in real time, clinicians believed that most patients who had been identified as having high risk were appropriate for a timely conversation about treatment and end-of-life preferences.
- Research Article
- 10.1007/s00423-025-03886-3
- Nov 20, 2025
- Langenbeck's Archives of Surgery
BackgroundTranscutaneous transhepatic gallbladder drainage (PTGBD) has shown significant efficacy in the treatment of elderly patients with acute cholecystitis. The goal of this study is to develop a machine learning-based web calculator aimed at predicting the optimal timing for cholecystectomy (LC) after PTGBD in elderly patients with acute cholecystitis (AC) to achieve precise personalized medicine.MethodsA retrospective analysis of 979 elderly patients with acute cholecystitis admitted to Jinzhou Central Hospital and the First Affiliated Hospital of Jinzhou Medical University from 2013 to 2024 was performed, and a total of 680 patients were included in the model development. Patients were divided into delayed (347 cases, surgery > 6 weeks post-PTGBD) and non-delayed (333 cases) groups based on the interval between PTGBD and LC. Minimal Absolute Contraction and Selection Operator (LASSO) and logistic analysis were used to determine the predictors of postponement of LC in elderly patients with AC after PTGBD. Next, we used eight ML algorithms, namely Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Extreme gradient boosting (XGB), Support Vector Machine (SVM), Multilayer Perceptron (MLP), K-nearest Neighbor (KNN), Gaussian Naive Bayes (GNB), to train and develop ML models using a 10x cross-validation method. The performance of the model was evaluated by a variety of indicators, including the area under the receiver operating characteristic curve (ROC), calibration curve, decision curve, PR curve, and confusion matrix. In addition, model interpretation is performed through Shapley Additive Interpretation (SHAP) analysis to clarify the importance of each feature of the model and its basis for decision-making. Finally, we chose to use the best model to develop a web-based calculator that could be used to predict the likelihood of delaying LC after PTGBD in elderly AC patients.ResultsIn multivariate logistic regression analysis, age, sex, gallbladder wall thickness, time between onset and PTGBD, white blood cell count (WBC), C-reactive protein (CRP), and neutrophil-to-lymphocyte ratio (NLR) were identified as independent predictors of delayed LC in elderly patients with AC after PTGBD. In the training set, the area under the receiver operating characteristic curve (AUC) values for these models ranged from 0.808 to 0.914, with the random forest (RF) model showing the highest AUC value. Through the evaluation of decision curve analysis (DCA), precision-recall (PR) curve and calibration curve, the RF model showed superior clinical decision support and prediction performance compared with the other seven models. Finally, we used the RF model to build an online network calculator, which aims to accurately assist doctors in making more informed and accurate clinical decisions and promote the wide application of the model in clinical practice (https://zw17786325639.shinyapps.io/Postpone/).ConclusionsThis study developed and validated an RF model network calculator based on clinical indicator information to assess the likelihood of delaying LC after PTGBD in elderly patients with acute cholecystitis. This tool is expected to assist physicians in making more appropriate clinical decisions for patients.
- Research Article
8
- 10.1007/s11069-022-05584-5
- Sep 5, 2022
- Natural Hazards
Wadi El-Matulla, located in the eastern desert of Egypt, is the most important water basin. The Qift–Qusayr highway (west–east direction) and the Cairo–Aswan eastern desert highway (north–south direction) pass through the watershed. Many urban areas (villages and industrial areas) and agricultural lands are located at the outlet of these basins. In addition, the basin has promising potential for future economic and urban development as it is located within the Golden Triangle (governmental megaproject). The current study investigates flood hazard modeling and its impact on the area. To determine the optimal flood susceptibility mapping algorithm, performance comparisons of three techniques were conducted: logistic regression (LR), extreme gradient boosting (EGB), and random forest (RF). Remote sensing, topographic, geologic, and meteorological data were used with the help of field visits to provide the spatial and inventory database required by the models. The performance and reliability of the predictions of the proposed models were evaluated using five statistical indices: receiver operating characteristic–area under the curve, overall accuracy (OAC), kappa index, root mean square error (RMSE), and mean absolute error (MAE). The performance of the models showed that the values of ROC (93, 86 and 80%), OAC (88, 82 and 76%), kappa index (0.85, 0.75 and 0.51), RMSE (0.34, 0.42 and 0.49) and MAE (0.12, 0.18 and 0.24) for RF, EGB, and LR, respectively. Based on AUC values, RF and EGB models provide excellent and very good prediction for flood susceptibility. Our results show that RF is the optimal algorithm for flood susceptibility mapping, followed by EGB and LR. Consequently, the predictive power of RF model is quite good and the flood susceptibility map was classified into five classes, namely very low (51.7%), low (23.7%), moderate (16.2%), high (7.1%), and very high (1.3%). Ultimately, the RF model was verified using sentinel-1 imagery for real floods in 2016 and 2021, and it provides good agreement. The optimal model could be useful for decision makers and planners to protect existing facilities and plan future projects in non-flood-prone areas. Accordingly, the most suitable areas for future development need to be distributed mainly in the low and very low flood hazard areas.
- Research Article
1
- 10.6313/fjr.2009.23(01).05
- Jun 1, 2009
- Formosan Journal of Rheumatology
Objective: Our aim was to characterize the ultrasonographic features of patients with calcium pyrophosphate dihydrate (CPPD) deposition disease, and compare X-ray and ultrasound in evaluating CPPD deposition disease. Methods: In this retrospective study, all 71 patients between 2004 and 2007 with CPPD deposition disease proved by microscopic synovial fluid analysis were enrolled. We collected and analyzed 38 patients of those, on whom both conventional X-ray and high-resolution ultrasound had been carried out. Results: All patients were elderly (i.e.>65y/o) and mostly coexisted with osteoarthritis. The involvement of knee joint was the most common site. Popliteal cyst was detected in 9 of 71 patients. Synovial fluid analysis of 38 patients with CPPD deposition disease revealed that the average total white cell count was 25592.1±16697.8/mm^3, with significant neutrophil predominance. There was significant evidence that ultrasound was more reliable than X-ray in the diagnosis of CPPD deposition disease (p=0.002). Besides, there were no patients with CPPD deposition disease in whom X-rays suggested CPPD deposition disease, but for whom ultrasound results were negative. Conclusion: We found that bright stippled foci in the synovial fluid or around the articular region, the thin hyperechoic band parallel to the surface of the hyaline cartilage, and the calcification of fibrocartilage seen on ultrasound could represent CPPD deposits. Our data showed that ultrasound is a useful and important tool in the diagnostic investigation of patients with CPPD deposition disease.
- Research Article
7
- 10.4239/wjd.v15.i1.43
- Jan 15, 2024
- World Journal of Diabetes
Among older adults, type 2 diabetes mellitus (T2DM) is widely recognized as one of the most prevalent diseases. Diabetic nephropathy (DN) is a frequent complication of DM, mainly characterized by renal microvascular damage. Early detection, aggressive prevention, and cure of DN are key to improving prognosis. Establishing a diagnostic and predictive model for DN is crucial in auxiliary diagnosis. To investigate the factors that impact T2DM complicated with DN and utilize this information to develop a predictive model. The clinical data of 210 patients diagnosed with T2DM and admitted to the First People's Hospital of Wenling between August 2019 and August 2022 were retrospectively analyzed. According to whether the patients had DN, they were divided into the DN group (complicated with DN) and the non-DN group (without DN). Multivariate logistic regression analysis was used to explore factors affecting DN in patients with T2DM. The data were randomly split into a training set (n = 147) and a test set (n = 63) in a 7:3 ratio using a random function. The training set was used to construct the nomogram, decision tree, and random forest models, and the test set was used to evaluate the prediction performance of the model by comparing the sensitivity, specificity, accuracy, recall, precision, and area under the receiver operating characteristic curve. Among the 210 patients with T2DM, 74 (35.34%) had DN. The validation dataset showed that the accuracies of the nomogram, decision tree, and random forest models in predicting DN in patients with T2DM were 0.746, 0.714, and 0.730, respectively. The sensitivities were 0.710, 0.710, and 0.806, respectively; the specificities were 0.844, 0.875, and 0.844, respectively; the area under the receiver operating characteristic curve (AUC) of the patients were 0.811, 0.735, and 0.850, respectively. The Delong test results revealed that the AUC values of the decision tree model were lower than those of the random forest and nomogram models (P < 0.05), whereas the difference in AUC values of the random forest and column-line graph models was not statistically significant (P > 0.05). Among the three prediction models, random forest performs best and can help identify patients with T2DM at high risk of DN.
- Research Article
- 10.1016/j.surg.2025.109445
- Aug 1, 2025
- Surgery
Development and validation of a machine learning-based model for predicting intraoperative blood loss during burn surgery.
- Research Article
5
- 10.1186/s41043-024-00647-8
- Oct 12, 2024
- Journal of Health, Population and Nutrition
Background and aimsThe birth weight of a newborn is a crucial factor that affects their overall health and future well-being. Low birth weight (LBW) is a widespread global issue, which the World Health Organization defines as weighing less than 2,500 g. LBW can have severe negative consequences on an individual’s health, including neonatal mortality and various health concerns throughout their life. To address this problem, this study has been conducted using BDHS 2017–2018 data to uncover important aspects of LBW using a variety of machine learning (ML) approaches and to determine the best feature selection technique and best predictive ML model.MethodsTo pick out the key features, the Boruta algorithm and wrapper method were used. Logistic Regression (LR) used as traditional method and several machine learning classifiers were then used, including, DT (Decision Tree), SVM (Support Vector Machine), NB (Naïve Bayes), RF (Random Forest), XGBoost (eXtreme Gradient Boosting), and AdaBoost (Adaptive Boosting), to determine the best model for predicting LBW. The model’s performance was evaluated based on the specificity, sensitivity, accuracy, F1 score and AUC value.ResultsResult shows, Boruta algorithm identifies eleven significant features including respondent’s age, highest education level, educational attainment, wealth index, age at first birth, weight, height, BMI, age at first sexual intercourse, birth order number, and whether the child is a twin. Incorporating Boruta algorithm’s significant features, the performance of traditional LR and ML methods including DT, SVM, NB, RF, XGBoost, and AB were evaluated where LR, had a specificity, sensitivity, accuracy and F1 score of 0.85, 0.5, 85.15% and 0.915. While the ML methods DT, SVM, NB, RF, XGBoost, and AB model’s respective accuracy values were 85.35%, 85.15%, 84.54%, 81.18%, and 84.41%. Based on the specificity, sensitivity, accuracy, F1 score and AUC, RF (specificity = 0.99, sensitivity = 0.58, accuracy = 85.86%, F1 score = 0.9243, AUC = 0.549) outperformed the other methods. Both the classical (LR) and machine learning (ML) models’ performance has improved dramatically when important characteristics are extracted using the wrapper method. The LR method identified five significant features with a specificity, sensitivity, accuracy and F1 score of 0.87, 0.33, 87.12% and 0.9309. The region, whether the infant is a twin, and cesarean delivery were the three key features discovered by the DT and RF models, which were implemented using the wrapper technique. All three models had the identical F1 score of 0.9318. However, “child is twin” was recognized as a significant feature by the SVM, NB, and AB models, with an F1 score of 0.9315. Ultimately, with an F1 score of 0.9315, the XGBoost model recognized “child is twin” and “age at first sex” as relevant features. Random Forest again beat the other approaches in this instance.ConclusionsThe study reveals Wrapper method as the optimal feature selection technique. The ML method outperforms traditional methods, with Random Forest (RF) being the most effective predictive model for Low-Birth-Weight prediction. The study suggests that policymakers in Bangladesh can mitigate low birth weight newborns by considering identified risk factors.
- Book Chapter
- 10.4018/979-8-3373-2647-4.ch010
- May 9, 2025
Employee churn is a significant challenge for organizations, leading to substantial costs associated with recruiting, onboarding, and training new employees. High turnover rates can negatively impact overall productivity, employee morale, and organizational stability. Therefore, accurately predicting employee churn is crucial for companies to implement targeted retention strategies, minimize turnover, and reduce associated expenses. In this study, we leveraged machine learning techniques to predict employee churn using the "HR Analytics" dataset from Kaggle. One of the key challenges in churn prediction is class imbalance, where the number of employees who leave is significantly lower than those who stay. To address this, we applied two data-balancing techniques: Synthetic Minority Over-sampling Technique (SMOTE) and Random Over-Sampling (ROS). We then trained and evaluated four machine learning models Logistic Regression, Random Forest, Decision Tree, and Extreme Gradient Boosting (XGBoost) on the balanced datasets. The F1 scores for the SMOTE-balanced data were: Logistic Regression (0.5990), Random Forest (0.9753), Decision Tree (0.9319), and XGBoost (0.9634). Meanwhile, the ROS-balanced data produced F1 scores of: Logistic Regression (0.5978), Random Forest (0.9760), Decision Tree (0.9475), and XGBoost (0.9703). The results demonstrated that ROS yielded superior performance, particularly for the Random Forest and XGBoost models, leading us to select ROS for further hyperparameter tuning. Using RandomizedSearchCV for optimization, the Random Forest model achieved the highest F1 score of 0.9779. Finally, we deployed the optimized Random Forest model via a Flask API, enabling HR professionals to access a user-friendly web interface for realtime churn prediction. This research highlights the effectiveness of machine learning in HR analytics and underscores the practical benefits of predictive modeling in workforce management, helping organizations proactively address employee retetion challenges.
- Research Article
3
- 10.7518/hxkq.2023.2023124
- Dec 1, 2023
- Hua xi kou qiang yi xue za zhi = Huaxi kouqiang yixue zazhi = West China journal of stomatology
The machine learning algorithm was used to construct a prediction model of children's dental caries to determine the risk factors of dental caries in children and put forward targeted measures and policy suggestions to improve children's oral health. Stratified cluster random sampling was adopted in this study. In accordance with different policies and measures in Sichuan Province, 12-year-old students from 3-4 middle schools in eight cities of Sichuan Province were randomly selected for questionnaire survey, oral examination, and physical examination. Multivariate logistic regression analysis of risk factors for dental caries in 12-year-old children was conducted. The dataset was randomly divided into training set and validation set at a ratio of 7∶3. Four machine learning algorithms, including random forest, decision tree, extreme gradient boosting (XGBoost), and Logistic regression, were constructed using R version 4.1.1, and the prediction effects of the four prediction models were evaluated using the area under receiver operating characteristic curve (AUC). A total of 4 439 children aged 12 years were included in this study. The incidence of permanent teeth caries was 50.93%. The results of multivariate logistic regression analysis showed that body mass index, highest educational background of the father, highest educational background of the mother, whether to brush teeth, how many times a day, use of toothpaste when brushing teeth, duration of brushing teeth, mouthwash after meals, eating before going to bed after brushing teeth, sweet drinks, snacks, going to dental clinic to examine teeth, and age of brushing teeth were the factors influencing children's dental caries (P<0.05). The AUC values predicted by random forest, decision tree, Logistic regression, and XGBoost were 0.840, 0.755, 0.799, and 0.794, respectively. In the random forest model, the variable with the highest contribution was eating before bed after brushing. A prediction model of dental caries in children was established on the basis of random forest, showing good prediction effect. Taking preventive measures for the main factors affecting the occurrence of dental caries in children is beneficial.
- Research Article
- 10.1016/j.ijnss.2025.10.011
- Oct 22, 2025
- International Journal of Nursing Sciences
ObjectivesThis study aimed to develop and validate a stroke risk prediction model based on machine learning (ML) and regional healthcare big data, and determine whether it may improve the prediction performance compared with the conventional Logistic Regression (LR) model.MethodsThis retrospective cohort study analyzed data from the CHinese Electronic health Records Research in Yinzhou (CHERRY) (2015–2021). We included adults aged 18–75 from the platform who had established records before 2015. Individuals with pre-existing stroke, key data absence, or excessive missingness (>30 %) were excluded. Data on demographic, clinical measures, lifestyle factors, comorbidities, and family history of stroke were collected. Variable selection was performed in two stages: an initial screening via univariate analysis, followed by a prioritization of variables based on clinical relevance and actionability, with a focus on those that are modifiable. Stroke prediction models were developed using LR and four ML algorithms: Decision Tree (DT), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Back Propagation Neural Network (BPNN). The dataset was split 7:3 for training and validation sets. Performance was assessed using receiver operating characteristic (ROC) curves, calibration, and confusion matrices, and the cutoff value was determined by Youden’s index to classify risk groups.ResultsThe study cohort comprised 92,172 participants with 436 incident stroke cases (incidence rate: 474/100,000 person-years). Ultimately, 13 predictor variables were included. RF achieved the highest accuracy (0.935), precision (0.923), sensitivity (recall: 0.947), and F1 score (0.935). Model evaluation demonstrated superior predictive performance of ML algorithms over conventional LR, with training/validation area under the curve (AUC)s of 0.777/0.779 (LR), 0.921/0.918 (BPNN), 0.988/0.980 (RF), 0.980/0.955 (DT), and 0.962/0.958 (XGBoost). Calibration analysis revealed a better fit for DT, LR and BPNN compared to RF and XGBoost model. Based on the optimal performance of the RF model, the ranking of factors in descending order of importance was: hypertension, age, diabetes, systolic blood pressure, waist, high-density lipoprotein Cholesterol, fasting blood glucose, physical activity, BMI, low-density lipoprotein cholesterol, total cholesterol, dietary habits, and family history of stroke. Using Youden’s index as the optimal cutoff, the RF model stratified individuals into high-risk (>0.789) and low-risk (≤0.789) groups with robust discrimination.ConclusionsThe ML-based prediction models demonstrated superior performance metrics compared to conventional LR and the RF is the optimal prediction model, providing an effective tool for risk stratification in primary stroke prevention in community settings.
- Research Article
4
- 10.1108/jpif-01-2020-0007
- Apr 24, 2020
- Journal of Property Investment & Finance
PurposeThe purpose of this study is to evaluate the performance of the ensemble learning models, such as the Random Forest and Extreme Gradient Boosting models, in predicting the direction of the Japan real estate investment trusts (J-REITs) at different return horizons, based on input obtained from various technical indicators.Design/methodology/approachThis study measures the predictability of J-REITs with technical indicators by using different horizons of REITs' return and machine learning models. The ensemble learning models includes Random Forest and Extreme Gradient Boosting models while the return horizons of REITs ranging from 1 to 300 days. The results were further split into individual years to check for the consistency of the performance across time.FindingsThe Extreme Gradient Boosting appears to be the best method in improving forecast accuracy but not the trading return. A wider return horizons platform seemed to deliver a relatively better performance in both forecast accuracy and trading return, when compared to the return horizon of one.Practical implicationsIt is recommended that the Extreme Gradient Boosting and Random Forest model be considered by practitioners for back-testing trading model. In addition, selecting different return horizons so as to achieve a better performance in trading/investment should also be considered.Originality/valueThe predictability of J-REITs using technical indicators was compared among different returns horizons and the models (Extreme Gradient Boosting and Random Forest).
- Research Article
9
- 10.1038/s41598-023-43211-w
- Sep 25, 2023
- Scientific Reports
Although the goal of rectal cancer treatment is to restore gastrointestinal continuity, some patients with rectal cancer develop a permanent stoma (PS) after sphincter-saving operations. Although many studies have identified the risk factors and causes of PS, few have precisely predicted the probability of PS formation before surgery. To validate whether an artificial intelligence model can accurately predict PS formation in patients with rectal cancer after sphincter-saving operations. Patients with rectal cancer who underwent a sphincter-saving operation at Taipei Medical University Hospital between January 1, 2012, and December 31, 2021, were retrospectively included in this study. A machine learning technique was used to predict whether a PS would form after a sphincter-saving operation. We included 19 routinely available preoperative variables in the artificial intelligence analysis. To evaluate the efficiency of the model, 6 performance metrics were utilized: accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and area under the receiving operating characteristic curve. In our classification pipeline, the data were randomly divided into a training set (80% of the data) and a validation set (20% of the data). The artificial intelligence models were trained using the training dataset, and their performance was evaluated using the validation dataset. Synthetic minority oversampling was used to solve the data imbalance. A total of 428 patients were included, and the PS rate was 13.6% (58/428) in the training set. The logistic regression (LR), Gaussian Naïve Bayes (GNB), Extreme Gradient Boosting (XGB), Gradient Boosting (GB), random forest, decision tree and light gradient boosting machine (LightGBM) algorithms were employed. The accuracies of the logistic regression (LR), Gaussian Naïve Bayes (GNB), Extreme Gradient Boosting (XGB), Gradient Boosting (GB), random forest (RF), decision tree (DT) and light gradient boosting machine (LightGBM) models were 70%, 76%, 89%, 93%, 95%, 79% and 93%, respectively. The area under the receiving operating characteristic curve values were 0.79 for the LR model, 0.84 for the GNB, 0.95 for the XGB, 0.95 for the GB, 0.99 for the RF model, 0.79 for the DT model and 0.98 for the LightGBM model. The key predictors that were identified were the distance of the lesion from the anal verge, clinical N stage, age, sex, American Society of Anesthesiologists score, and preoperative albumin and carcinoembryonic antigen levels. Integration of artificial intelligence with available preoperative data can potentially predict stoma outcomes after sphincter-saving operations. Our model exhibited excellent predictive ability and can improve the process of obtaining informed consent.
- Research Article
38
- 10.3934/mbe.2024061
- Jan 1, 2023
- Mathematical Biosciences and Engineering
The green concretes industry benefits from utilizing gel to replace parts of the cement in concretes. However, measuring the compressive strength of geo-polymer concretes (CSGPoC) needs a significant amount of work and expenditure. Therefore, the best idea is predicting CSGPoC with a high level of accuracy. To do this, the base learner and super learner machine learning models were proposed in this study to anticipate CSGPoC. The decision tree (DT) is applied as base learner, and the random forest and extreme gradient boosting (XGBoost) techniques are used as super learner system. In this regard, a database was provided involving 259 CSGPoC data samples, of which four-fifths of is considered for the training model and one-fifth is selected for the testing models. The values of fly ash, ground-granulated blast-furnace slag (GGBS), Na2SiO3, NaOH, fine aggregate, gravel 4/10 mm, gravel 10/20 mm, water/solids ratio, and NaOH molarity were considered as input of the models to estimate CSGPoC. To evaluate the reliability and performance of the decision tree (DT), XGBoost, and random forest (RF) models, 12 performance evaluation metrics were determined. Based on the obtained results, the highest degree of accuracy is achieved by the XGBoost model with mean absolute error (MAE) of 2.073, mean absolute percentage error (MAPE) of 5.547, Nash-Sutcliffe (NS) of 0.981, correlation coefficient (R) of 0.991, R2 of 0.982, root mean square error (RMSE) of 2.458, Willmott's index (WI) of 0.795, weighted mean absolute percentage error (WMAPE) of 0.046, Bias of 2.073, square index (SI) of 0.054, p of 0.027, mean relative error (MRE) of -0.014, and a20 of 0.983 for the training model and MAE of 2.06, MAPE of 6.553, NS of 0.985, R of 0.993, R2 of 0.986, RMSE of 2.307, WI of 0.818, WMAPE of 0.05, Bias of 2.06, SI of 0.056, p of 0.028, MRE of -0.015, and a20 of 0.949 for the testing model. By importing the testing set into trained models, values of 0.8969, 0.9857, and 0.9424 for R2 were obtained for DT, XGBoost, and RF, respectively, which show the superiority of the XGBoost model in CSGPoC estimation. In conclusion, the XGBoost model is capable of more accurately predicting CSGPoC than DT and RF models.
- Research Article
45
- 10.3389/fpubh.2021.812023
- Dec 10, 2021
- Frontiers in Public Health
Background: Bone cement leakage is a common complication of percutaneous vertebroplasty and it could be life-threatening to some extent. The aim of this study was to develop a machine learning model for predicting the risk of cement leakage in patients with osteoporotic vertebral compression fractures undergoing percutaneous vertebroplasty. Furthermore, we developed an online calculator for clinical application.Methods: This was a retrospective study including 385 patients, who had osteoporotic vertebral compression fracture disease and underwent surgery at the Department of Spine Surgery, Liuzhou People's Hospital from June 2016 to June 2018. Combing the patient's clinical characteristics variables, we applied six machine learning (ML) algorithms to develop the predictive models, including logistic regression (LR), Gradient boosting machine (GBM), Extreme gradient boosting (XGB), Random Forest (RF), Decision Tree (DT) and Multilayer perceptron (MLP), which could predict the risk of bone cement leakage. We tested the results with ten-fold cross-validation, which calculated the Area Under Curve (AUC) of the six models and selected the model with the highest AUC as the excellent performing model to build the web calculator.Results: The results showed that Injection volume of bone cement, Surgery time and Multiple vertebral fracture were all independent predictors of bone cement leakage by using multivariate logistic regression analysis in the 385 observation subjects. Furthermore, Heatmap revealed the relative proportions of the 15 clinical variables. In bone cement leakage prediction, the AUC of the six ML algorithms ranged from 0.633 to 0.898, while the RF model had an AUC of 0.898 and was used as the best performing ML Web calculator (https://share.streamlit.io/liuwencai0/pvp_leakage/main/pvp_leakage) was developed to estimate the risk of bone cement leakage that each patient undergoing vertebroplasty.Conclusion: It achieved a good prediction for the occurrence of bone cement leakage with our ML model. The Web calculator concluded based on RF model can help orthopedist to make more individual and rational clinical strategies.
- Research Article
7
- 10.1186/s12877-022-03631-1
- Nov 28, 2022
- BMC Geriatrics
BackgroundFemoral neck fracture and lacunar cerebral infarction (LCI) are the most common diseases in the elderly. When LCI patients undergo a series of traumas such as surgery, their postoperative recovery results are often poor. Moreover, few studies have explored the relationship between LCI and femoral neck fracture in the elderly. Therefore, this study will develop a ML (machine learning)-based model to predict LCI before surgery in elderly patients with a femoral neck fracture.MethodsProfessional medical staff retrospectively collected the data of 161 patients with unilateral femoral neck fracture who underwent surgery in the Second Affiliated Hospital of Wenzhou Medical University database from January 1, 2015, to January 1, 2020. Patients were divided into two groups based on LCI (diagnosis based on cranial CT image): the LCI group and the non-LCI group. Preoperative clinical characteristics and preoperative laboratory data were collected for all patients. Features were selected by univariate and multivariate logistic regression analysis, with age, white blood cell (WBC), prealbumin, aspartate aminotransferase (AST), total protein, globulin, serum creatinine (Scr), blood urea nitrogen (Bun)/Scr, lactate dehydrogenase (LDH), serum sodium and fibrinogen as the features of the ML model. Five machine learning algorithms, Logistic regression (LR), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), Random Forest (RF), and Decision tree (DT), were used in combination with preoperative clinical characteristics and laboratory data to establish a predictive model of LCI in patients with a femoral neck fracture. Furthermore, indices like the area under the receiver operating characteristic (AUROC), sensitivity, specificity, and accuracy were calculated to test the models’ performance.ResultsThe AUROC of 5 ML models ranged from 0.76 to 0.95. It turned out that the RF model demonstrated the highest performance in predicting LCI for femoral neck fracture patients before surgery, whose AUROC was 0.95, sensitivity 1.00, specificity 0.81, and accuracy 0.90 in validation sets. Furthermore, the top 4 high-ranking variables in the RF model were prealbumin, fibrinogen, globulin and Scr, in descending order of importance.ConclusionIn this study, 5 ML models were developed and validated for patients with femoral neck fracture to predict preoperative LCI. RF model provides an excellent predictive value with an AUROC of 0.95. Clinicians can better conduct multidisciplinary perioperative management for patients with femoral neck fractures through this model and accelerate the postoperative recovery of patients.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.