Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Assessment of flood susceptibility prediction based on optimized tree-based machine learning models

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Abstract Due to the physical processes of floods, the use of data-driven machine learning (ML) models is a cost-efficient approach to flood modeling. The innovation of the current study revolves around the development of tree-based ML models, including Rotation Forest (ROF), Alternating Decision Tree (ADTree), and Random Forest (RF) via binary particle swarm optimization (BPSO), to estimate flood susceptibility in the Maneh and Samalqan watershed, Iran. Therefore, to implement the models, 370 flood-prone locations in the case study were identified (2016–2019). In addition, 20 hydrogeological, topographical, geological, and environmental criteria affecting flood occurrence in the study area were extracted to predict flood susceptibility. The area under the curve (AUC) and a variety of other statistical indicators were used to evaluate the performances of the models. The results showed that the RF-BPSO (AUC=0.935) has the highest accuracy compared to ROF-BPSO (AUC=0.904), and ADTree-BPSO (AUC=0.923). In addition, the findings illustrated that the chance of flooding in the center of the area in question is greater than in other points due to lower elevation, lower slope, and proximity to rivers. Therefore, the ensemble framework proposed here can also be used to predict flood susceptibility maps in other regions with similar geo-environmental characteristics for flood management and prevention.

Similar Papers
  • Research Article
  • Cite Count Icon 18
  • 10.1108/ijhma-11-2022-0172
Predictability of Belgian residential real estate rents using tree-based ML models and IML techniques
  • Apr 13, 2023
  • International Journal of Housing Markets and Analysis
  • Ian Lenaers + 2 more

PurposeThe purpose is twofold. First, this study aims to establish that black box tree-based machine learning (ML) models have better predictive performance than a standard linear regression (LR) hedonic model for rent prediction. Second, it shows the added value of analyzing tree-based ML models with interpretable machine learning (IML) techniques.Design/methodology/approachData on Belgian residential rental properties were collected. Tree-based ML models, random forest regression and eXtreme gradient boosting regression were applied to derive rent prediction models to compare predictive performance with a LR model. Interpretations of the tree-based models regarding important factors in predicting rent were made using SHapley Additive exPlanations (SHAP) feature importance (FI) plots and SHAP summary plots.FindingsResults indicate that tree-based models perform better than a LR model for Belgian residential rent prediction. The SHAP FI plots agree that asking price, cadastral income, surface livable, number of bedrooms, number of bathrooms and variables measuring the proximity to points of interest are dominant predictors. The direction of relationships between rent and its factors is determined with SHAP summary plots. In addition to linear relationships, it emerges that nonlinear relationships exist.Originality/valueRent prediction using ML is relatively less studied than house price prediction. In addition, studying prediction models using IML techniques is relatively new in real estate economics. Moreover, to the best of the authors’ knowledge, this study is the first to derive insights of driving determinants of predicted rents from SHAP FI and SHAP summary plots.

  • Research Article
  • Cite Count Icon 8
  • 10.1016/j.ijdrr.2024.104955
A tropical cyclone risk prediction framework using flood susceptibility and tree-based machine learning models: County-level direct economic loss prediction in Guangdong Province
  • Nov 1, 2024
  • International Journal of Disaster Risk Reduction
  • Jian Yang + 6 more

A tropical cyclone risk prediction framework using flood susceptibility and tree-based machine learning models: County-level direct economic loss prediction in Guangdong Province

  • Research Article
  • Cite Count Icon 3
  • 10.1177/03611981241236180
Developing Tree-Based Machine Learning Models for Estimating the Pile Setup Parameter for Clay Soils
  • Mar 21, 2024
  • Transportation Research Record: Journal of the Transportation Research Board
  • Mohammad Moontakim Shoaib + 1 more

Piles driven into cohesive soils usually experience increases in capacity with time, known as pile setup phenomenon. Several empirical methods have been developed to estimate the setup parameter (A), such as the well-known Skov and Denver equation. Parameter A is crucial in predicting pile setup behavior. In this study, tree-based machine learning (ML) models such as random forest (RF) and gradient boosted tree (GBT) were applied for better estimation of the setup parameter. A database consisting of setup data from 12 instrumented piles tested at different times, and corresponding cone penetration test (CPT) and soil boring data were collected. The soil properties (i.e., undrained shear strength, plasticity index, over consolidation ratio, and coefficient of consolidation) and CPT data (cone tip resistance, sleeve friction) of clayey soil layers at pile locations were utilized to develop the ML models. Three types of tree-based ML model were developed for predicting the setup parameter, A, using CPT and soil boring data. A comparison was made between the developed ML models based on soil properties, ML models based on CPT data, and an artificial neural network (ANN) model proposed in a previous study using the same dataset. Furthermore, the best performing ML models were compared with two nonlinear regression models recommended in a previous study using the same dataset for estimating the setup parameter. The results of this research clearly demonstrated the superior prediction capability of the tree-based ML models, particularly the GBT model over the ANN and the two nonlinear regression models in evaluating the pile setup parameter.

  • Research Article
  • Cite Count Icon 92
  • 10.1016/j.scs.2023.104744
Flood susceptibility prediction using tree-based machine learning models in the GBA
  • Jun 25, 2023
  • Sustainable Cities and Society
  • Hai-Min Lyu + 1 more

Flood susceptibility prediction using tree-based machine learning models in the GBA

  • Conference Article
  • 10.37308/dfi49.2024970301
Develop Machine Learning Models to Establish the Load-Settlement Curve of Piles from Cone Penetration Test Data
  • Oct 6, 2024
  • Murad Abu-Farsakh

The evaluation of load-settlement behavior of piles is very crucial in meeting the serviceability criteria for pile analysis and design. The most reliable approach for estimating this behavior can be achieved by conducting pile load tests. However, due to the considerable expense and time requirement of such in-situ testing, the load-transfer methods have been used routinely in practice. In this paper, an alternative tree-based machine learning (ML) modeling is explored to predict the load-settlement behavior of axially loaded single piles from cone penetration test (CPT) data. Two variants of tree-based ML models, the random forest (RF) and gradient boosted tree (GBT), are developed in this study to estimate the load-settlement behavior of piles from CPT data (corrected cone tip resistance, qt, and sleeve friction, fs). A database of load-settlement curves of 64 static pile load tests and the corresponding CPT test data were compiled and used for the development of these ML models. The developed RF and GBT models are evaluated based on several statistical criteria. The load-settlement curves for six PLTs predicted using the developed RF and GBT models were compared with the measured data and the load-settlement curves predicted using the conventional load-transfer methods. The results demonstrated the great potential of tree-based ML (RF, GBT) models for predicting the load-settlement behavior of axially loaded piles from CPT data. The comparison clearly shows that the ML models outperformed the conventional load-transfer methods. Amongst the two ML models, the results show that the GBT model outperformed the RF model.

  • Dissertation
  • 10.31390/gradschool_theses.5785
Exploring Machine Learning in Deep Foundation and Soil Classification Application
  • Apr 28, 2023
  • Mohammad Moontakim Shoaib

The applicability of several Machine Learning (ML) models was explored in this research to predict the ultimate capacity and load-settlement behavior of axially loaded single-driven piles from Cone Penetration Test (CPT) data. Additionally, a common CPT-based soil behavior type (SBT) classification system was reproduced using those ML models. Eighty static pile load tests and corresponding CPT data close to those pile locations were collected from 34 sites in Louisiana for the deep foundation application. On the other hand, 70 CPT soundings were taken in 14 different parishes across Louisiana for the soil classification application. Specifically, tree-based ML models such as Decision Tree (DT), Random Forest (RF) and Gradient Boosted Tree (GBT) were developed and compared in predicting ultimate pile capacity. It was found that the GBT model performed best among the tree-based models. This GBT model was further compared with four conventional direct pile-CPT methods based on several statistical criteria, and in this comparison, the GBT model outranked the conventional methods. On the contrary, in addition to RF and GBT, an Artificial Neural Network (ANN) model was developed to predict load-settlement behavior. A comparison was made between these ML models based on several statistical criteria. Furthermore, these ML models were graphically compared with two common load-transfer methods in predicting actual static pile load test curves. All the ML models performed satisfactorily in predicting the load-settlement behavior. Finally, a common CPT-based SBT classification system was replicated using RF and GBT models. Six different input settings were explored, and a total of 12 models were developed. The model types included basic CPT parameters such as corrected cone tip resistance, sleeve friction, pore water pressure parameters and effective overburden pressure, as well as normalized CPT parameters. A comparison between all types of models was conducted based on several performance criteria, and it was found that GBT models with input settings comprising normalized parameters performed best among all the developed models. Hence, these findings support the use of ML models in predicting ultimate pile capacity and load-settlement behavior and replicating a CPT-based SBT classification system.

  • Research Article
  • Cite Count Icon 13
  • 10.1186/s12967-024-05395-1
Tree-based ensemble machine learning models in the prediction of acute respiratory distress syndrome following cardiac surgery: a multicenter cohort study
  • Aug 15, 2024
  • Journal of Translational Medicine
  • Hang Zhang + 13 more

BackgroundAcute respiratory distress syndrome (ARDS) after cardiac surgery is a severe respiratory complication with high mortality and morbidity. Traditional clinical approaches may lead to under recognition of this heterogeneous syndrome, potentially resulting in diagnosis delay. This study aims to develop and external validate seven machine learning (ML) models, trained on electronic health records data, for predicting ARDS after cardiac surgery.MethodsThis multicenter, observational cohort study included patients who underwent cardiac surgery in the training and testing cohorts (data from Nanjing First Hospital), as well as those patients who had cardiac surgery in a validation cohort (data from Shanghai General Hospital). The number of important features was determined using the sliding windows sequential forward feature selection method (SWSFS). We developed a set of tree-based ML models, including Decision Tree, GBDT, AdaBoost, XGBoost, LightGBM, Random Forest, and Deep Forest. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC) and Brier score. The SHapley Additive exPlanation (SHAP) techinque was employed to interpret the ML model. Furthermore, a comparison was made between the ML models and traditional scoring systems. ARDS is defined according to the Berlin definition.ResultsA total of 1996 patients who had cardiac surgery were included in the study. The top five important features identified by the SWSFS were chronic obstructive pulmonary disease, preoperative albumin, central venous pressure_T4, cardiopulmonary bypass time, and left ventricular ejection fraction. Among the seven ML models, Deep Forest demonstrated the best performance, with an AUC of 0.882 and a Brier score of 0.809 in the validation cohort. Notably, the SHAP values effectively illustrated the contribution of the 13 features attributed to the model output and the individual feature's effect on model prediction. In addition, the ensemble ML models demonstrated better performance than the other six traditional scoring systems.ConclusionsOur study identified 13 important features and provided multiple ML models to enhance the risk stratification for ARDS after cardiac surgery. Using these predictors and ML models might provide a basis for early diagnostic and preventive strategies in the perioperative management of ARDS patients.

  • Research Article
  • Cite Count Icon 37
  • 10.1200/cci.20.00172
Computing the Hazard Ratios Associated With Explanatory Variables Using Machine Learning Models of Survival Data.
  • Dec 1, 2021
  • JCO Clinical Cancer Informatics
  • Sameer Sundrani + 1 more

The application of Cox proportional hazards (CoxPH) models to survival data and the derivation of hazard ratio (HR) are well established. Although nonlinear, tree-based machine learning (ML) models have been developed and applied to the survival analysis, no methodology exists for computing HRs associated with explanatory variables from such models. We describe a novel way to compute HRs from tree-based ML models using the SHapley Additive exPlanation values, which is a locally accurate and consistent methodology to quantify explanatory variables' contribution to predictions. We used three sets of publicly available survival data consisting of patients with colon, breast, or pan cancer and compared the performance of CoxPH with the state-of-the-art ML model, XGBoost. To compute the HR for explanatory variables from the XGBoost model, the SHapley Additive exPlanation values were exponentiated and the ratio of the means over the two subgroups was calculated. The CI was computed via bootstrapping the training data and generating the ML model 1,000 times. Across the three data sets, we systematically compared HRs for all explanatory variables. Open-source libraries in Python and R were used in the analyses. For the colon and breast cancer data sets, the performance of CoxPH and XGBoost was comparable, and we showed good consistency in the computed HRs. In the pan-cancer data set, we showed agreement in most variables but also an opposite finding in two of the explanatory variables between the CoxPH and XGBoost result. Subsequent Kaplan-Meier plots supported the finding of the XGBoost model. Enabling the derivation of HR from ML models can help to improve the identification of risk factors from complex survival data sets and to enhance the prediction of clinical trial outcomes.

  • Research Article
  • 10.1182/blood-2024-211964
Systematic Review of Machine Learning Models for Myelodysplastic Syndrome Diagnosis
  • Nov 5, 2024
  • Blood
  • Karna Desai + 5 more

Systematic Review of Machine Learning Models for Myelodysplastic Syndrome Diagnosis

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.imed.2024.01.001
Machine learning predicts long-term mortality after acute myocardial infarction using systolic time intervals and routinely collected clinical data
  • Jun 17, 2024
  • Intelligent Medicine
  • Bijan Roudini + 3 more

Machine learning predicts long-term mortality after acute myocardial infarction using systolic time intervals and routinely collected clinical data

  • Research Article
  • Cite Count Icon 2
  • 10.1097/md.0000000000038513
Performance evaluation of ML models for preoperative prediction of HER2-low BC based on CE-CBBCT radiomic features: A prospective study
  • Jun 14, 2024
  • Medicine
  • Xianfei Chen + 3 more

To explore the value of machine learning (ML) models based on contrast-enhanced cone-beam breast computed tomography (CE-CBBCT) radiomics features for the preoperative prediction of human epidermal growth factor receptor 2 (HER2)-low expression breast cancer (BC). Fifty-six patients with HER2-negative invasive BC who underwent preoperative CE-CBBCT were prospectively analyzed. Patients were randomly divided into training and validation cohorts at approximately 7:3. A total of 1046 quantitative radiomic features were extracted from CE-CBBCT images and normalized using z-scores. The Pearson correlation coefficient and recursive feature elimination were used to identify the optimal features. Six ML models were constructed based on the selected features: linear discriminant analysis (LDA), random forest (RF), support vector machine (SVM), logistic regression (LR), AdaBoost (AB), and decision tree (DT). To evaluate the performance of these models, receiver operating characteristic curves and area under the curve (AUC) were used. Seven features were selected as the optimal features for constructing the ML models. In the training cohort, the AUC values for SVM, LDA, RF, LR, AB, and DT were 0.984, 0.981, 1.000, 0.970, 1.000, and 1.000, respectively. In the validation cohort, the AUC values for the SVM, LDA, RF, LR, AB, and DT were 0.859, 0.880, 0.781, 0.880, 0.750, and 0.713, respectively. Among all ML models, the LDA and LR models demonstrated the best performance. The DeLong test showed that there were no significant differences among the receiver operating characteristic curves in all ML models in the training cohort (P > .05); however, in the validation cohort, the DeLong test showed that the differences between the AUCs of LDA and RF, AB, and DT were statistically significant (P = .037, .003, .046). The AUCs of LR and RF, AB, and DT were statistically significant (P = .023, .005, .030). Nevertheless, no statistically significant differences were observed when compared to the other ML models. ML models based on CE-CBBCT radiomics features achieved excellent performance in the preoperative prediction of HER2-low BC and could potentially serve as an effective tool to assist in precise and personalized targeted therapy.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 227
  • 10.3390/info11060332
Evaluation of Tree-Based Ensemble Machine Learning Models in Predicting Stock Price Direction of Movement
  • Jun 20, 2020
  • Information
  • Ernest Kwame Ampomah + 2 more

Forecasting the direction and trend of stock price is an important task which helps investors to make prudent financial decisions in the stock market. Investment in the stock market has a big risk associated with it. Minimizing prediction error reduces the investment risk. Machine learning (ML) models typically perform better than statistical and econometric models. Also, ensemble ML models have been shown in the literature to be able to produce superior performance than single ML models. In this work, we compare the effectiveness of tree-based ensemble ML models (Random Forest (RF), XGBoost Classifier (XG), Bagging Classifier (BC), AdaBoost Classifier (Ada), Extra Trees Classifier (ET), and Voting Classifier (VC)) in forecasting the direction of stock price movement. Eight different stock data from three stock exchanges (NYSE, NASDAQ, and NSE) are randomly collected and used for the study. Each data set is split into training and test set. Ten-fold cross validation accuracy is used to evaluate the ML models on the training set. In addition, the ML models are evaluated on the test set using accuracy, precision, recall, F1-score, specificity, and area under receiver operating characteristics curve (AUC-ROC). Kendall W test of concordance is used to rank the performance of the tree-based ML algorithms. For the training set, the AdaBoost model performed better than the rest of the models. For the test set, accuracy, precision, F1-score, and AUC metrics generated results significant to rank the models, and the Extra Trees classifier outperformed the other models in all the rankings.

  • Research Article
  • Cite Count Icon 4
  • 10.3390/futuretransp2040052
Analysis of the Performance of Machine Learning Models in Predicting the Severity Level of Large-Truck Crashes
  • Nov 16, 2022
  • Future Transportation
  • Jinli Liu + 3 more

Large-truck crashes often result in substantial economic and social costs. Accurate prediction of the severity level of a reported truck crash can help rescue teams and emergency medical services take the right actions and provide proper medical care, thereby reducing its economic and social costs. This study aims to investigate the modeling issues in using machine learning methods for predicting the severity level of large-truck crashes. To this end, six representative machine learning (ML) methods, including four classification tree-based ML models, specifically the Extreme Gradient Boosting tree (XGBoost), the Adaptive Boosting tree (AdaBoost), Random Forest (RF), and the Gradient Boost Decision Tree (GBDT), and two non-tree-based ML models, specifically Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN), were selected for predicting the severity level of large-truck crashes. The accuracy levels of these six methods were compared and the effects of data-balancing techniques in model prediction performance were also tested using three different resampling techniques: Undersampling, oversampling, and mix sampling. The results indicated that better prediction performances were obtained using the dataset with a similar distribution to the original sample population instead of using the datasets with a balanced sample population. Regarding the prediction performance, the tree-based ML models outperform the non-tree-based ML models and the GBDT model performed best among all of the six models.

  • Preprint Article
  • Cite Count Icon 1
  • 10.1101/2024.04.15.24305875
Inhospital Mortality, Readmission, and Prolonged Length of Stay Risk Prediction Leveraging Historical Electronic Health Records
  • Apr 16, 2024
  • medRxiv
  • Rajeev Bopche + 5 more

ObjectiveThe aim of this study was to investigate predictive capabilities of historical records of patients maintained at hospitals towards predicting an impending adverse outcomes such as, mortality, readmission, and prolonged length of stay (PLOS).MethodsLeveraging a de-identified dataset from a tertiary care university hospital, we developed a eXplainable Artificial Intelligence (XAI) framework combining tree-based and traditional ML models with interpretations, and statistical analysis of predictors of mortality, readmission, and PLOS.ResultsOur framework demonstrated exceptional predictive performance with notable Area Under the Receiver Operating Characteristic (AUROC) of 0.9625 and Area Under the Precision-Recall Curve (AUPRC) of 0.8575 for 30-day mortality at discharge and an AUROC of 0.9545 and AUPRC of 0.8419 at admission. For the readmission and PLOS risk the highest AUROC achieved were 0.8198 and 0.9797 repectively. The tree-based machine learning (ML) models consistently outperformed the traditional ML models in all the four prediction tasks. The key predictors were age, derived temporal features, routine laboratory tests, and diagnostic and procedural codes.ConclusionThe study underscores the potential of leveraging medical history for enhanced predictive analytics in hospitals. We present a accurate and intuitive framework for early warning models that can be easily implemented in the current and developing digital health platforms to accurately predict adverse outcomes.

  • Research Article
  • Cite Count Icon 1
  • 10.29271/jcpsp.2025.08.1007
Predicting Extracorporeal Shock Wave Lithotripsy Outcomes Using Machine Learning and the Triple-/Quadruple-D Scores.
  • Aug 1, 2025
  • Journal of the College of Physicians and Surgeons--Pakistan : JCPSP
  • Mucahit Gelmis + 5 more

To evaluate the predictive performance of the triple-D and quadruple-D scores integrated with machine learning (ML) models in determining stone-free outcomes after extracorporeal shock wave lithotripsy (ESWL), and to compare ML model performance and identify its key predictors influencing ESWL success. An observational study. Place and Duration of the Study: Department of Urology, Gaziosmanpasa Training and Research Hospital, Istanbul, Turkiye, from October 2020 to November 2024. A total of 309 patients who underwent ESWL were analysed. The patients were categorised into stone-free and non-stone- free groups based on post-treatment imaging. Clinical parameters, including quadruple-D score (stone volume, density, skin-to-stone distance [SSD], and location), were recorded. Three ML models‒random forest (RF), logistic regression (LR), and neural network (NN)‒were trained on 80% of the dataset and tested on 20%. Model performance was assessed using accuracy, area under the curve (AUC), precision, recall, and F1 score. The quadruple-D score (AUC: 0.724) demonstrated superior predictive power compared to the Triple-D score (AUC: 0.700). Among ML models, RF achieved the highest accuracy (82.9%, AUC: 0.91), followed by NN (80.9%, AUC: 0.87) and LR (79.6%, AUC: 0.85). Significant predictors of ESWL success were stone density, volume, SSD, and the quadruple-D score, while age and body mass index (BMI) were not significant. Integrating the quadruple-D score with ML models, particularly RF, enhances the prediction of ESWL outcomes. Combining clinical expertise with computational intelligence can refine patient selection and optimise treatment strategies. However, prospective studies are needed to validate these findings. Extracorporeal shock wave lithotripsy, Quadruple-D score, Machine learning, Random forest, Stone-free prediction.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant