Prediction and Investigation of the Injury Severity of Drivers Involved in Speeding-Related Crashes Using Machine Learning Models

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Speeding is the major reason for road traffic crashes and deaths in India. The other driver’s faults include driving under the influence, using mobile phones while driving and driving on the wrong side of the road. Therefore, this study attempts to predict and investigate the driver injury severity (DIS) in speeding-related crashes. A total of 793 police-reported single-vehicle and two-vehicle crash data from Imphal City, India, collected between 2011–2020, were analysed and modelled. For DIS prediction, eleven supervised machine learning (ML) models were implemented using 5-fold and 10-fold cross-validation (FCVs) and trained at train ratio (TR) values of 0.5, 0.6, 0.7 and 0.8 in each FCV. The top ML model for the DIS prediction was selected based on the best combination of recall, accuracy, F1 score, area under the curve (AUC) and precision metrics. Feature importance analysis (FIA) was conducted to determine the impactful factors in DIS prediction. The gradient boosting tree (GBT), stochastic gradient descent, decision tree and lasso-LARS models were identified as the top-performing ML models for the DIS prediction at TR = 0.5, 0.6, 0.7 and 0.8, respectively, in 5-FCV. The light GBM (TR = 0.5 and 0.7), GBT (TR = 0.6) and lasso-LARS (TR = 0.8) were the best-performing ML models in 10-FCV. The FIA results indicated that vehicle type (two-wheeler), nature of crash (head-on collision) and time of crash (12 PM–6 PM and 6 AM–12 PM) variables were the most impactful variables on the DIS prediction in Imphal speeding-related crashes. These ML models can be employed in hilly areas for the accurate prediction of DIS. The study results can help transportation planners in designing road safety measures and strategies to lessen DIS in speeding-related crashes.

Similar Papers
  • Research Article
  • Cite Count Icon 1
  • 10.1680/jmuen.25.00005
Modelling of young and old driver injury severity in speeding-related road accidents
  • May 20, 2025
  • Proceedings of the Institution of Civil Engineers - Municipal Engineer
  • Neero Gumsar Sorum + 1 more

This study aimed to identify the best machine learning (ML) models for predicting the injury severity of young and old drivers involved in speeding-related road accidents. The best ML model was identified based on the optimal combination of accuracy, F1 score, and area-under-curve metrics. The feature importance analysis was employed to rate the various significant factors affecting young and old driver injury severity (DIS) based on the best ML models and their impacts on the DIS were compared. Police-reported accident data collected from Itanagar and Shillong during 2011–2020 were used in the present duty. Twelve supervised ML models were implemented using 5-,10-, and 15-fold cross-validations in each train ratio value (0.7 and 0.8). The results revealed that the Extra Trees model was the best ML model for the Itanagar young and old DIS prediction. However, it was not easy to identify the best overall ML model in Shillong. The vehicle type variable was the most important factor in predicting the injury severity of Itanagar and Shillong drivers. These findings would be helpful for the transportation authorities to formulate appropriate policies and to adopt effective measures to reduce the speed tendency behaviour of young and old drivers for road safety.

  • Research Article
  • Cite Count Icon 1
  • 10.1177/10547738241260947
Machine Learning Predicts Peripherally Inserted Central Catheters-Related Deep Vein Thrombosis Using Patient Features and Catheterization Technology Features.
  • Jul 1, 2024
  • Clinical nursing research
  • Yuan Sheng + 1 more

This study aims to use patient feature and catheterization technology feature variables to train the corresponding machine learning (ML) models to predict peripherally inserted central catheters-deep vein thrombosis (PICCs-DVT) and analyze the importance of the two types of features to PICCs-DVT from the aspect of "input-output" correlation. To comprehensively and systematically summarize the variables used to describe patient features and catheterization technical features, this study combined 18 literature involving the two types of features in predicting PICCs-DVT. A total of 21 variables used to describe the two types of features were summarized, and feature values were extracted from the data of 1,065 PICCs patients from January 1, 2021 to August 31, 2022, to construct a data sample set. Then, 70% of the sample set is used for model training and hyperparameter optimization, and 30% of the sample set is used for PICCs-DVT prediction and feature importance analysis of three common ML classification models (i.e. support vector classifier [SVC], random forest [RF], and artificial neural network [ANN]). In terms of prediction performance, this study selected four metrics to evaluate the prediction performance of the model: precision (P), recall (R), accuracy (ACC), and area under the curve (AUC). In terms of feature importance analysis, this study chooses a single feature analysis method based on the "input-output" sensitivity principle-Permutation Importance. For the mean model performance, the three ML models on the test set are P = 0.92, R = 0.95, ACC = 0.88, and AUC = 0.81. Specifically, the RF model is P = 0.95, R = 0.96, ACC = 0.92, AUC = 0.86; the ANN model is P = 0.92, R = 0.95, ACC = 0.88, AUC = 0.81; the SVC model is P = 0.88, R = 0.94, ACC = 0.85, AUC = 0.77. For feature importance analysis, Catheter-to-vein rate (RF: 91.55%, ANN: 82.25%, SVC: 87.71%), Zubrod-ECOG-WHO score (RF: 66.35%, ANN: 82.25%, SVC: 44.35%), and insertion attempt (RF: 44.35%, ANN: 37.65%, SVC: 65.80%) all occupy the top three in the ML models prediction task of PICCs-DVT, showing relatively consistent ranking results. The ML models show good performance in predicting PICCs-DVT and reveal a relatively consistent ranking of feature importance from the data. The important features revealed might help clinical medical staff to better understand and analyze the formation mechanism of PICCs-DVT from a data-driven perspective.

  • Research Article
  • 10.1186/s12874-025-02694-z
Comparison of machine learning methods versus traditional Cox regression for survival prediction in cancer using real-world data: a systematic literature review and meta-analysis
  • Oct 28, 2025
  • BMC Medical Research Methodology
  • Yinan Huang + 6 more

BackgroundAccurate prediction of survival in oncology can guide targeted interventions. The traditional regression-based Cox proportional hazards (CPH) model has statistical assumptions and may have limited predictive accuracy. With the capability to model large datasets, machine learning (ML) holds the potential to improve the prediction of time-to-event outcomes, such as cancer survival outcomes. The present study aimed to systematically summarize the use of ML models for cancer survival outcomes in observational studies and to compare the performance of ML models with CPH models.MethodsWe systematically searched PubMed, MEDLINE (via EBSCO), and Embase for studies that evaluated ML models vs. CPH models for cancer survival outcomes. The use of ML algorithms was summarized, and either the area under the curve (AUC) or the concordance index (C-index) for the ML and CPH models were presented descriptively. Only studies that provided a measure of discrimination, i.e., AUC or C-index, and 95% confidence interval (CI) were included in the final meta-analysis. A random-effects model was used to compare the predictive performance in the pooled AUC or C-index estimates between ML and CPH models using R. The quality of the studies was evaluated using available checklists. Multiple sensitivity analyses were performed.ResultsA total of 21 studies were included for systematic review and 7 for meta-analysis. Across the 21 articles, diverse ML models were used, including random survival forest (N=16, 76.19%), gradient boosting (N=5, 23.81%), and deep learning (N=8, 38.09%). In predicting cancer survival outcomes, ML models showed no superior performance over CPH regression. The standardized mean difference in AUC or C-index was 0.01 (95% CI: -0.01 to 0.03). Results from the sensitivity analyses confirmed the robustness of the main findings.ConclusionsML models had similar performance compared with CPH models in predicting cancer survival outcomes. Although this systematic review highlights the promising use of ML to improve the quality of care in oncology, findings from this review also suggest opportunities to improve ML reporting transparency. Future systematic reviews should focus on the comparative performance between specific ML models and CPH regression in time-to-event outcomes in specific type of cancer or other disease areas.Supplementary InformationThe online version contains supplementary material available at 10.1186/s12874-025-02694-z.

  • Research Article
  • Cite Count Icon 33
  • 10.1007/s00330-020-07083-2
Improved long-term prognostic value of coronary CT angiography-derived plaque measures and clinical parameters on adverse cardiac outcome using machine learning
  • Jul 28, 2020
  • European Radiology
  • Christian Tesche + 13 more

To evaluate the long-term prognostic value of coronary CT angiography (cCTA)-derived plaque measures and clinical parameters on major adverse cardiac events (MACE) using machine learning (ML). Datasets of 361 patients (61.9 ± 10.3years, 65% male) with suspected coronary artery disease (CAD) who underwent cCTA were retrospectively analyzed. MACE was recorded. cCTA-derived adverse plaque features and conventional CT risk scores together with cardiovascular risk factors were provided to a ML model to predict MACE. A boosted ensemble algorithm (RUSBoost) utilizing decision trees as weak learners with repeated nested cross-validation to train and validate the model was used. Performance of the ML model was calculated using the area under the curve (AUC). MACE was observed in 31 patients (8.6%) after a median follow-up of 5.4years. Discriminatory power was significantly higher for the ML model (AUC 0.96 [95%CI 0.93-0.98]) compared with conventional CT risk scores including Agatston calcium score (AUC 0.84 [95%CI 0.80-0.87]), segment involvement score (AUC 0.88 [95%CI 0.84-0.91]), and segment stenosis score (AUC 0.89 [95%CI 0.86-0.92], all p < 0.05). Similar results were shown for adverse plaque measures (AUCs 0.72-0.82, all p < 0.05) and clinical parameters including the Framingham risk score (AUCs 0.71-0.76, all p < 0.05). The ML model yielded significantly higher diagnostic performance compared with logistic regression analysis (AUC 0.96 vs. 0.92, p = 0.024). Integration of a ML model improves the long-term prediction of MACE when compared with conventional CT risk scores, adverse plaque measures, and clinical information. ML algorithms may improve the integration of patient's information to enhance risk stratification. • A machine learning (ML) model portends high discriminatory power to predict major adverse cardiac events (MACE). • ML-based risk stratification shows superior diagnostic performance for MACE prediction over coronary CT angiography (cCTA)-derived risk scores or clinical parameters alone. • A ML model outperforms conventional logistic regression analysis for the prediction of MACE.

  • Research Article
  • Cite Count Icon 13
  • 10.1007/s00261-021-03051-6
Predicting the stages of liver fibrosis with multiphase CT radiomics based on volumetric features.
  • Mar 22, 2021
  • Abdominal Radiology
  • Enming Cui + 6 more

To develop and externally validate a multiphase computed tomography (CT)-based machine learning (ML) model for staging liver fibrosis (LF) by using whole liver slices. The development dataset comprised 232 patients with pathological analysis for LF, and the test dataset comprised 100 patients from an independent outside institution. Feature extraction was performed based on the precontrast (PCP), arterial (AP), portal vein (PVP) phase, and three-phase CT images. CatBoost was utilized for ML model investigation by using the features with good reproducibility. The diagnostic performance of ML models based on each single- and three-phase CT image was compared with that of radiologists' interpretations, the aminotransferase-to-platelet ratio index, and the fibrosis index based on four factors (FIB-4) by using the receiver operating characteristic curve with the area under the curve (AUC) value. Although the ML model based on three-phase CT image (AUC = 0.65-0.80) achieved higher AUC value than that based on PCP (AUC = 0.56-0.69) and PVP (AUC = 0.51-0.74) in predicting various stage of LF, significant difference was not found. The best CT-based ML model (AUC = 0.65-0.80) outperformed the FIB-4 in differentiating advanced LF and cirrhosis and radiologists' interpretation (AUC = 0.50-0.76) in the diagnosis of significant and advanced LF. All PCP, PVP, and three-phase CT-based ML models can be an acceptable in assessing LF, and the performance of the PCP-based ML model is comparable to that of the enhanced CT image-based ML model.

  • Dissertation
  • 10.31390/gradschool_theses.5785
Exploring Machine Learning in Deep Foundation and Soil Classification Application
  • Jun 5, 2023
  • Mohammad Moontakim Shoaib

The applicability of several Machine Learning (ML) models was explored in this research to predict the ultimate capacity and load-settlement behavior of axially loaded single-driven piles from Cone Penetration Test (CPT) data. Additionally, a common CPT-based soil behavior type (SBT) classification system was reproduced using those ML models. Eighty static pile load tests and corresponding CPT data close to those pile locations were collected from 34 sites in Louisiana for the deep foundation application. On the other hand, 70 CPT soundings were taken in 14 different parishes across Louisiana for the soil classification application. Specifically, tree-based ML models such as Decision Tree (DT), Random Forest (RF) and Gradient Boosted Tree (GBT) were developed and compared in predicting ultimate pile capacity. It was found that the GBT model performed best among the tree-based models. This GBT model was further compared with four conventional direct pile-CPT methods based on several statistical criteria, and in this comparison, the GBT model outranked the conventional methods. On the contrary, in addition to RF and GBT, an Artificial Neural Network (ANN) model was developed to predict load-settlement behavior. A comparison was made between these ML models based on several statistical criteria. Furthermore, these ML models were graphically compared with two common load-transfer methods in predicting actual static pile load test curves. All the ML models performed satisfactorily in predicting the load-settlement behavior. Finally, a common CPT-based SBT classification system was replicated using RF and GBT models. Six different input settings were explored, and a total of 12 models were developed. The model types included basic CPT parameters such as corrected cone tip resistance, sleeve friction, pore water pressure parameters and effective overburden pressure, as well as normalized CPT parameters. A comparison between all types of models was conducted based on several performance criteria, and it was found that GBT models with input settings comprising normalized parameters performed best among all the developed models. Hence, these findings support the use of ML models in predicting ultimate pile capacity and load-settlement behavior and replicating a CPT-based SBT classification system.

  • Research Article
  • Cite Count Icon 1
  • 10.1139/cjce-2023-0503
Modeling driver injury severity using machine learning algorithms
  • May 15, 2024
  • Canadian Journal of Civil Engineering
  • Neero Gumsar Sorum + 1 more

This study planned to predict and analyze the driver injury severity (DIS) using 12 machine learning (ML) algorithms. Police reports of single- and two-vehicle accidents that occurred during 2011–2020 in the two cities of India (Itanagar and Imphal) were used in this study. The best-performing model to predict the DIS for Itanagar was Gradient Boosting Trees (GBT). “Causes of Accident” variable had shown maximum impact on the DIS. In the case of Imphal, it was the GBT, Extra Trees, and Random Forest models across all k-fold cross-validation for train ratios 0.70, 0.80, and 0.90, respectively. “Causes of Accident” and “Vehicle Type” had shown maximum impact on the DIS. These results reveal that the ML models can be applied in hilly areas to predict and identify the important factors that affect DIS. Transportation authorities can analyze road accident data using these models while implementing various road safety measures.

  • Research Article
  • Cite Count Icon 2
  • 10.1097/md.0000000000038513
Performance evaluation of ML models for preoperative prediction of HER2-low BC based on CE-CBBCT radiomic features: A prospective study
  • Jun 14, 2024
  • Medicine
  • Xianfei Chen + 3 more

To explore the value of machine learning (ML) models based on contrast-enhanced cone-beam breast computed tomography (CE-CBBCT) radiomics features for the preoperative prediction of human epidermal growth factor receptor 2 (HER2)-low expression breast cancer (BC). Fifty-six patients with HER2-negative invasive BC who underwent preoperative CE-CBBCT were prospectively analyzed. Patients were randomly divided into training and validation cohorts at approximately 7:3. A total of 1046 quantitative radiomic features were extracted from CE-CBBCT images and normalized using z-scores. The Pearson correlation coefficient and recursive feature elimination were used to identify the optimal features. Six ML models were constructed based on the selected features: linear discriminant analysis (LDA), random forest (RF), support vector machine (SVM), logistic regression (LR), AdaBoost (AB), and decision tree (DT). To evaluate the performance of these models, receiver operating characteristic curves and area under the curve (AUC) were used. Seven features were selected as the optimal features for constructing the ML models. In the training cohort, the AUC values for SVM, LDA, RF, LR, AB, and DT were 0.984, 0.981, 1.000, 0.970, 1.000, and 1.000, respectively. In the validation cohort, the AUC values for the SVM, LDA, RF, LR, AB, and DT were 0.859, 0.880, 0.781, 0.880, 0.750, and 0.713, respectively. Among all ML models, the LDA and LR models demonstrated the best performance. The DeLong test showed that there were no significant differences among the receiver operating characteristic curves in all ML models in the training cohort (P > .05); however, in the validation cohort, the DeLong test showed that the differences between the AUCs of LDA and RF, AB, and DT were statistically significant (P = .037, .003, .046). The AUCs of LR and RF, AB, and DT were statistically significant (P = .023, .005, .030). Nevertheless, no statistically significant differences were observed when compared to the other ML models. ML models based on CE-CBBCT radiomics features achieved excellent performance in the preoperative prediction of HER2-low BC and could potentially serve as an effective tool to assist in precise and personalized targeted therapy.

  • Research Article
  • 10.1182/blood-2024-211964
Systematic Review of Machine Learning Models for Myelodysplastic Syndrome Diagnosis
  • Nov 5, 2024
  • Blood
  • Karna Desai + 5 more

Systematic Review of Machine Learning Models for Myelodysplastic Syndrome Diagnosis

  • Conference Article
  • 10.37308/dfi49.2024970301
Develop Machine Learning Models to Establish the Load-Settlement Curve of Piles from Cone Penetration Test Data
  • Oct 6, 2024
  • Murad Abu-Farsakh

The evaluation of load-settlement behavior of piles is very crucial in meeting the serviceability criteria for pile analysis and design. The most reliable approach for estimating this behavior can be achieved by conducting pile load tests. However, due to the considerable expense and time requirement of such in-situ testing, the load-transfer methods have been used routinely in practice. In this paper, an alternative tree-based machine learning (ML) modeling is explored to predict the load-settlement behavior of axially loaded single piles from cone penetration test (CPT) data. Two variants of tree-based ML models, the random forest (RF) and gradient boosted tree (GBT), are developed in this study to estimate the load-settlement behavior of piles from CPT data (corrected cone tip resistance, qt, and sleeve friction, fs). A database of load-settlement curves of 64 static pile load tests and the corresponding CPT test data were compiled and used for the development of these ML models. The developed RF and GBT models are evaluated based on several statistical criteria. The load-settlement curves for six PLTs predicted using the developed RF and GBT models were compared with the measured data and the load-settlement curves predicted using the conventional load-transfer methods. The results demonstrated the great potential of tree-based ML (RF, GBT) models for predicting the load-settlement behavior of axially loaded piles from CPT data. The comparison clearly shows that the ML models outperformed the conventional load-transfer methods. Amongst the two ML models, the results show that the GBT model outperformed the RF model.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.3389/fendo.2024.1353023
Meta-analysis of machine learning models for the diagnosis of central precocious puberty based on clinical, hormonal (laboratory) and imaging data
  • Mar 25, 2024
  • Frontiers in Endocrinology
  • Yilin Chen + 2 more

BackgroundCentral precocious puberty (CPP) is a common endocrine disorder in children, and its diagnosis primarily relies on the gonadotropin-releasing hormone (GnRH) stimulation test, which is expensive and time-consuming. With the widespread application of artificial intelligence in medicine, some studies have utilized clinical, hormonal (laboratory) and imaging data-based machine learning (ML) models to identify CPP. However, the results of these studies varied widely and were challenging to directly compare, mainly due to diverse ML methods. Therefore, the diagnostic value of clinical, hormonal (laboratory) and imaging data-based ML models for CPP remains elusive. The aim of this study was to investigate the diagnostic value of ML models based on clinical, hormonal (laboratory) and imaging data for CPP through a meta-analysis of existing studies.MethodsWe conducted a comprehensive search for relevant English articles on clinical, hormonal (laboratory) and imaging data-based ML models for diagnosing CPP, covering the period from the database creation date to December 2023. Pooled sensitivity, specificity, positive likelihood ratio (LR+), negative likelihood ratio (LR-), summary receiver operating characteristic (SROC) curve, and area under the curve (AUC) were calculated to assess the diagnostic value of clinical, hormonal (laboratory) and imaging data-based ML models for diagnosing CPP. The I2 test was employed to evaluate heterogeneity, and the source of heterogeneity was investigated through meta-regression analysis. Publication bias was assessed using the Deeks funnel plot asymmetry test.ResultsSix studies met the eligibility criteria. The pooled sensitivity and specificity were 0.82 (95% confidence interval (CI) 0.62-0.93) and 0.85 (95% CI 0.80-0.90), respectively. The LR+ was 6.00, and the LR- was 0.21, indicating that clinical, hormonal (laboratory) and imaging data-based ML models exhibited an excellent ability to confirm or exclude CPP. Additionally, the SROC curve showed that the AUC of the clinical, hormonal (laboratory) and imaging data-based ML models in the diagnosis of CPP was 0.90 (95% CI 0.87-0.92), demonstrating good diagnostic value for CPP.ConclusionBased on the outcomes of our meta-analysis, clinical and imaging data-based ML models are excellent diagnostic tools with high sensitivity, specificity, and AUC in the diagnosis of CPP. Despite the geographical limitations of the study findings, future research endeavors will strive to address these issues to enhance their applicability and reliability, providing more precise guidance for the differentiation and treatment of CPP.

  • Research Article
  • 10.1177/03611981241236180
Developing Tree-Based Machine Learning Models for Estimating the Pile Setup Parameter for Clay Soils
  • Mar 21, 2024
  • Transportation Research Record: Journal of the Transportation Research Board
  • Mohammad Moontakim Shoaib + 1 more

Piles driven into cohesive soils usually experience increases in capacity with time, known as pile setup phenomenon. Several empirical methods have been developed to estimate the setup parameter (A), such as the well-known Skov and Denver equation. Parameter A is crucial in predicting pile setup behavior. In this study, tree-based machine learning (ML) models such as random forest (RF) and gradient boosted tree (GBT) were applied for better estimation of the setup parameter. A database consisting of setup data from 12 instrumented piles tested at different times, and corresponding cone penetration test (CPT) and soil boring data were collected. The soil properties (i.e., undrained shear strength, plasticity index, over consolidation ratio, and coefficient of consolidation) and CPT data (cone tip resistance, sleeve friction) of clayey soil layers at pile locations were utilized to develop the ML models. Three types of tree-based ML model were developed for predicting the setup parameter, A, using CPT and soil boring data. A comparison was made between the developed ML models based on soil properties, ML models based on CPT data, and an artificial neural network (ANN) model proposed in a previous study using the same dataset. Furthermore, the best performing ML models were compared with two nonlinear regression models recommended in a previous study using the same dataset for estimating the setup parameter. The results of this research clearly demonstrated the superior prediction capability of the tree-based ML models, particularly the GBT model over the ANN and the two nonlinear regression models in evaluating the pile setup parameter.

  • Research Article
  • 10.1177/08850666251390848
An Interpretable Machine Learning Model for Early Multitemporal Prediction of Onset of Acute Kidney Injury in Intensive Care Unit Patients with Severe Trauma.
  • Oct 29, 2025
  • Journal of intensive care medicine
  • Bingrui Gao + 3 more

Acute Kidney Injury (AKI), a leading organ failure cause in critical patients, demands early high-risk identification to enhance outcomes. Yet comparative analyses of diagnostic and prognostic machine learning (ML) models across multiple post-admission timeframes are lacking. Using MIMIC-IV, we carried out using the Boruta algorithm for feature selection, developing and comparing six ML models to predict AKI risk at 0-24, 24-48, 48-72, 0-48, and 0-72 h post-ICU admission. Model performance was evaluated using the Area Under the Curve (AUC) and confusion matrix. Decision Curve and calibration analyses assessed clinical applicability. We compared models with Sequential Organ Failure Assessment (SOFA) and SAPSII scores to evaluate the accuracy of the ML models. Finally, Shapley Additive Explanations (SHAP) values interpreted and visualized key features of the optimal model. Our study involved 2092 trauma Intensive Care Unit (ICU) patients. Using the 17 selected out of the 48 features among trauma patients 24 h after ICU admissions, among the six ML models and two scoring systems, all ML models outperformed SOFA and SAPS II, and the extreme gradient boosting (XGBoost) exhibited the best performance, achieving an AUC of 0.948 (95% CI [0.929-0.966]) for AKI prediction within 24 h of admission, with an AUC of 0.941 ([0.892-0.917]) and 0.878 ([0.863-0.892]) at 0-48 and 0-72 h period, respectively. However, their predictive accuracies were very limited at 24-48 h (AUC 0.602 [0.562-0.643]) and 48-72 h (AUC 0.490 [0.429-0.551]), respectively. Urine output per kilogram per hour at 6 and 12 h and age were the most important features identified through SHAP analysis. Our study found ML models excel in diagnosing AKI risk in ICU trauma patients but have limited prognostic accuracy at 24-48 and 48-72 h post-admission. Further research is needed to improve this using time-series ML models with optimal windows.

  • Research Article
  • Cite Count Icon 9
  • 10.1177/03611981231170128
Exploring Tree-Based Machine Learning Models to Estimate the Ultimate Pile Capacity From Cone Penetration Test Data
  • May 18, 2023
  • Transportation Research Record: Journal of the Transportation Research Board
  • Mohammad Moontakim Shoaib + 1 more

Several approaches have been developed to estimate the ultimate capacity of piles, such as static and dynamic load tests, static analysis from soil borings, and directly utilizing in-situ test results. Recently, there has been increased interest in using in-situ cone penetration test (CPT) to estimate pile capacity. Several analytical pile-CPT methods have been developed, which involve several correlation assumptions that can affect their accuracy. In this paper, three tree-based machine learning (ML) models, namely decision tree (DT), random forest (RF), and gradient boosted tree (GBT), are developed for estimating the ultimate capacity of piles from CPT data. A database that contains 80 pile load tests and associated CPT data collected in Louisiana was used to develop these ML models. The measured ultimate pile capacity (Qm) was determined using Davisson’s interpretation method from the load–settlement curve of each pile load test. Among the developed ML models, GBT demonstrated the most accurate ML model compared with the others. The estimation of ultimate pile capacity from the GBT model is compared with those obtained from the four best-performing direct pile-CPT methods (based on a previous study): the University of Florida (UF), probabilistic, European Regional Technical Committee 3 (ERTC3), and Laboratoire Central des Ponts et Chaussées (LCPC) methods. The GBT and pile-CPT methods were evaluated and ranked based on analysis of multiple statistical criteria. The results clearly showed that the GBT model outperforms the four direct pile-CPT methods for estimating the ultimate capacity of piles.

  • Research Article
  • Cite Count Icon 229
  • 10.1001/jamanetworkopen.2021.2240
Use of Machine Learning to Develop and Evaluate Models Using Preoperative and Intraoperative Data to Identify Risks of Postoperative Complications
  • Mar 30, 2021
  • JAMA network open
  • Bing Xue + 7 more

Postoperative complications can significantly impact perioperative care management and planning. To assess machine learning (ML) models for predicting postoperative complications using independent and combined preoperative and intraoperative data and their clinically meaningful model-agnostic interpretations. This retrospective cohort study assessed 111 888 operations performed on adults at a single academic medical center from June 1, 2012, to August 31, 2016, with a mean duration of follow-up based on the length of postoperative hospital stay less than 7 days. Data analysis was performed from February 1 to September 31, 2020. Outcomes included 5 postoperative complications: acute kidney injury (AKI), delirium, deep vein thrombosis (DVT), pulmonary embolism (PE), and pneumonia. Patient and clinical characteristics available preoperatively, intraoperatively, and a combination of both were used as inputs for 5 candidate ML models: logistic regression, support vector machine, random forest, gradient boosting tree (GBT), and deep neural network (DNN). Model performance was compared using the area under the receiver operating characteristic curve (AUROC). Model interpretations were generated using Shapley Additive Explanations by transforming model features into clinical variables and representing them as patient-specific visualizations. A total of 111 888 patients (mean [SD] age, 54.4 [16.8] years; 56 915 [50.9%] female; 82 533 [73.8%] White) were included in this study. The best-performing model for each complication combined the preoperative and intraoperative data with the following AUROCs: pneumonia (GBT), 0.905 (95% CI, 0.903-0.907); AKI (GBT), 0.848 (95% CI, 0.846-0.851); DVT (GBT), 0.881 (95% CI, 0.878-0.884); PE (DNN), 0.831 (95% CI, 0.824-0.839); and delirium (GBT), 0.762 (95% CI, 0.759-0.765). Performance of models that used only preoperative data or only intraoperative data was marginally lower than that of models that used combined data. When adding variables with missing data as input, AUROCs increased from 0.588 to 0.905 for pneumonia, 0.579 to 0.848 for AKI, 0.574 to 0.881 for DVT, 0.5 to 0.831 for PE, and 0.6 to 0.762 for delirium. The Shapley Additive Explanations analysis generated model-agnostic interpretation that illustrated significant clinical contributors associated with risks of postoperative complications. The ML models for predicting postoperative complications with model-agnostic interpretation offer opportunities for integrating risk predictions for clinical decision support. Such real-time clinical decision support can mitigate patient risks and help in anticipatory management for perioperative contingency planning.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.