Hyperparameter-tuned Light Gradient Boosting Machine model for predicting breaking wave height

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

ABSTRACT This study proposes a new model using Light Gradient Boosting Machine (LightGBM) to predict breaking wave height based on input wave parameters. To determine optimal hyperparameter, Optuna is employed and conducts 100 independent runs with 10-fold cross-validation. Additionally, SHapley Additive exPlanations (SHAP) analysis is applied to investigate behavior of model. Results show that LightGBM model optimized with Optuna shows excellent performance for estimating breaker height. Root mean square error of model is 1.861 cm (for training dataset) and 3.518 cm (for testing dataset). Coefficients of determination are also high with 0.998 and 0.992 for training and testing datasets, respectively. This accuracy is remarkably higher than previous existing breaking wave height models. Besides, SHAP analysis highlights that deep-water wave height and water depth have the greatest impact on breaker height prediction. The results demonstrate that combination of Optuna and LightGBM enhances robustness and generalization of model for predicting breaking wave height.

Similar Papers
  • Research Article
  • Cite Count Icon 17
  • 10.1016/j.ocemod.2023.102177
Prediction of breaking wave height by using artificial neural network-based approach
  • Feb 6, 2023
  • Ocean Modelling
  • Nga Thanh Duong + 3 more

Prediction of breaking wave height by using artificial neural network-based approach

  • Research Article
  • Cite Count Icon 47
  • 10.1016/j.cmpb.2022.107038
Extreme gradient boosting model to assess risk of central cervical lymph node metastasis in patients with papillary thyroid carcinoma: Individual prediction using SHapley Additive exPlanations
  • Jul 23, 2022
  • Computer Methods and Programs in Biomedicine
  • Ying Zou + 8 more

Extreme gradient boosting model to assess risk of central cervical lymph node metastasis in patients with papillary thyroid carcinoma: Individual prediction using SHapley Additive exPlanations

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.3390/jmse10010050
Estimation of Wave-Breaking Index by Learning Nonlinear Relation Using Multilayer Neural Network
  • Jan 3, 2022
  • Journal of Marine Science and Engineering
  • Miyoung Yun + 2 more

Estimating wave-breaking indexes such as wave height and water depth is essential to understanding the location and scale of the breaking wave. Therefore, numerous wave-flume laboratory experiments have been conducted to develop empirical wave-breaking formulas. However, the nonlinearity between the parameters has not been fully incorporated into the empirical equations. Thus, this study proposes a multilayer neural network utilizing the nonlinear activation function and backpropagation to extract nonlinear relationships. Existing laboratory experiment data for the monochromatic regular wave are used to train the proposed network. Specifically, the bottom slope, deep-water wave height and wave period are plugged in as the input values that simultaneously estimate the breaking-wave height and wave-breaking location. Typical empirical equations employ deep-water wave height and length as input variables to predict the breaking-wave height and water depth. A newly proposed model directly utilizes breaking-wave height and water depth without nondimensionalization. Thus, the applicability can be significantly improved. The estimated wave-breaking index is statistically verified using the bias, root-mean-square errors, and Pearson correlation coefficient. The performance of the proposed model is better than existing breaking-wave-index formulas as well as having robust applicability to laboratory experiment conditions, such as wave condition, bottom slope, and experimental scale.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 58
  • 10.3390/su15021408
Random Forest Algorithm for the Strength Prediction of Geopolymer Stabilized Clayey Soil
  • Jan 11, 2023
  • Sustainability
  • Husein Ali Zeini + 5 more

Unconfined compressive strength (UCS) can be used to assess the applicability of geopolymer binders as ecologically friendly materials for geotechnical projects. Furthermore, soft computing technologies are necessary since experimental research is often challenging, expensive, and time-consuming. This article discusses the feasibility and the performance required to predict UCS using a Random Forest (RF) algorithm. The alkali activator studied was sodium hydroxide solution, and the considered geopolymer source material was ground-granulated blast-furnace slag and fly ash. A database with 283 clayey soil samples stabilized with geopolymer was considered to determine the UCS. The database was split into two sections for the development of the RF model: the training data set (80%) and the testing data set (20%). Several measures, including coefficient of determination (R), mean absolute error (MAE), and root mean square error (RMSE), were used to assess the effectiveness of the RF model. The statistical findings of this study demonstrated that the RF is a reliable model for predicting the UCS value of geopolymer-stabilized clayey soil. Furthermore, based on the obtained values of RMSE = 0.9815 and R2 = 0.9757 for the testing set, respectively, the RF approach showed to provide excellent results for predicting unknown data within the ranges of examined parameters. Finally, the SHapley Additive exPlanations (SHAP) analysis was implemented to identify the most influential inputs and to quantify their behavior of input variables on the UCS.

  • Research Article
  • 10.3390/rs18010040
Refined Leaf Area Index Retrieval in Yellow River Delta Coastal Wetlands: UAV-Borne Hyperspectral and LiDAR Data Fusion and SHAP–Correlation-Integrated Machine Learning
  • Dec 23, 2025
  • Remote Sensing
  • Chenqiang Shan + 9 more

The leaf area index (LAI) serves as a critical parameter for assessing wetland ecosystem functions, and accurate LAI retrieval holds substantial significance for wetland conservation and ecological monitoring. To address the spatial constraints of traditional ground-based measurements and the limited accuracy of single-source remote sensing data, this study utilized unmanned aerial vehicle (UAV)-borne hyperspectral and LiDAR sensors to acquire high-quality multi-source remote sensing data of coastal wetlands in the Yellow River Delta. Three machine learning algorithms—random forest (RF), Extreme Gradient Boosting (XGBoost), and Categorical Boosting (CatBoost)—were employed for LAI retrieval modeling. A total of 38 vegetation indices (VIs) and 12-point cloud features (PCFs) were extracted from hyperspectral imagery and LiDAR point cloud data, respectively. Pearson correlation analysis and the Shapley Additive Explanations (SHAP) method were integrated to identify and select the most informative VIs and PCFs. The performance of LAI retrieval models built on single-source features (VIs or PCFs) or multi-source feature fusion was evaluated using the coefficient of determination (R2) and root mean square error (RMSE). The main findings are as follows: (1) Multi-source feature fusion significantly improved LAI retrieval accuracy, with the RF model achieving the highest performance (R2 = 0.968, RMSE = 0.125). (2) LiDAR-derived structural metrics and hyperspectral-derived vegetation indices were identified as critical factors for accurate LAI retrieval. (3) The feature selection method integrating mean absolute SHAP values (|SHAP| values) with Pearson correlation analysis enhanced model robustness. (4) The intertidal zone exhibited pronounced spatial heterogeneity in the vegetation LAI distribution.

  • Research Article
  • Cite Count Icon 1
  • 10.54021/seesv5n2-017
Prediction and interpretation of limit pressure of clayey soils using ensemble machine learning methods and shapely additive explanations
  • Jul 9, 2024
  • STUDIES IN ENGINEERING AND EXACT SCIENCES
  • Kamel Goudjil + 4 more

The pressuremeter test (PMT), a valuable geotechnical in situ test, is used to design foundations of varying depths (shallow, semi-deep, and deep). It assesses a soil's bearing capacity and settlement through two key parameters: limit pressure and pressuremeter modulus. However, the high cost and time demands of PMTs limit their widespread use. This study addresses this challenge by exploring the effectiveness of ensemble machine learning algorithms. To achieve this main goal, we employ two methods, Extreme Gradient Boosting (XGBoost) and Random Forest, to predict limit pressure of soil. To develop the mentioned models an experimental database was used to train and validate the developed models. The effectiveness of these methods are evaluated using three statistical metrics: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Coefficient of Determination (R²). The performance metrics show that the developed XGBoost and Random Forest models are viable alternatives to the pressure meter test for estimating limit pressure. Both models achieved high R-squared values (around 0.99 for training and 0.90 for testing) and a low root mean squared error (RMSE) of 3.23 and 4.13 for the testing set, respectively. These results demonstrate the effectiveness of using machine learning in the geotechnical field. To further understand the influence of individual factors on the predictions, we will utilize the Shapley Additive explanations (SHAP) method. This technique analyzes the contribution of each input variable (feature) to the model's predictions of limit pressure. By quantifying the importance of these features; SHAP provides valuable insights into which soil properties most significantly affect the foundation design parameters.

  • Research Article
  • Cite Count Icon 33
  • 10.1016/j.jobe.2023.106363
Machine learning approach for predicting concrete compressive, splitting tensile, and flexural strength with waste foundry sand
  • Mar 26, 2023
  • Journal of Building Engineering
  • Vikas Mehta

Machine learning approach for predicting concrete compressive, splitting tensile, and flexural strength with waste foundry sand

  • Research Article
  • 10.61838/kman.jayps.4986
Explainable AI Forecast of Psychological Distress in Adolescents Based on Family Conflict, School Pressure, and Emotion Regulation Capacity
  • Jan 1, 2025
  • Journal of Adolescent and Youth Psychological Studies
  • Lars Becker + 1 more

Objective: The objective of this study was to develop and interpret an explainable artificial intelligence model for forecasting psychological distress in adolescents by quantifying the joint and individual contributions of family conflict, school pressure, and emotion regulation capacity. Methods and Materials: This cross-sectional study was conducted among 1,142 secondary school students aged 13–18 years in Germany using multi-stage cluster sampling. Participants completed validated self-report measures of psychological distress, family conflict, school pressure, and emotion regulation capacity. Data were analyzed using an explainable gradient boosting machine learning framework with five-fold cross-validation. Model performance was evaluated using root mean square error, mean absolute error, and coefficient of determination. Feature contributions and interaction effects were examined using Shapley Additive Explanations and partial dependence analyses to ensure full interpretability of predictions. Findings: The explainable model demonstrated strong predictive accuracy, accounting for 69% of the variance in adolescent psychological distress on the test dataset (R² = 0.69, RMSE = 3.58, MAE = 2.71). Feature attribution analysis revealed that school pressure was the most influential predictor (36.2% relative contribution), followed by emotion regulation capacity (31.1%) and family conflict (24.7%), while demographic variables showed minimal impact. Interaction analyses indicated that high emotion regulation capacity substantially attenuated the negative effects of elevated school pressure and family conflict on psychological distress. Conclusion: Adolescent psychological distress is primarily shaped by the combined influence of academic stress, family dynamics, and emotional self-regulation. Explainable artificial intelligence provides a powerful and transparent framework for identifying individualized risk profiles and informing targeted mental health interventions in educational and clinical settings.

  • Research Article
  • Cite Count Icon 2
  • 10.1038/s41598-025-96005-7
An interpretable deep learning model for the accurate prediction of mean fragmentation size in blasting operations
  • Apr 3, 2025
  • Scientific Reports
  • Baoqian Huan + 4 more

Fragmentation size is an important indicator for evaluating blasting effectiveness. To address the limitations of conventional blasting fragmentation size prediction methods in terms of prediction accuracy and applicability, this study proposes an NRBO-CNN-LSSVM model for predicting mean fragmentation size, which integrates Convolutional Neural Networks (CNN), Least Squares Support Vector Machines (LSSVM), and the Newton-Raphson Optimizer (NRBO). The study is based on a database containing 105 samples derived from both previous research and field collection. Additionally, several machine learning prediction models, including CNN-LSSVM, CNN, LSSVM, Support Vector Machine (SVM), and Support Vector Regression (SVR), are developed for comparative analysis. The results showed that the NRBO-CNN-LSSVM model achieved remarkable prediction accuracy on the training dataset, with a coefficient of determination (R2) as high as 0.9717 and a root mean square error (RMSE) as low as 0.0285. On the test set, the model maintained high prediction accuracy, with an R2 value of 0.9105 and an RMSE of 0.0403. SHapley Additive exPlanations (SHAP) analysis revealed that the modulus of elasticity (E) was a key variable influencing the prediction of mean fragmentation size. Partial Dependence Plots (PDP) analysis further disclosed a significant positive correlation between the modulus of elasticity (E) and mean fragmentation size. In contrast, a distinct negative correlation was observed between the powder factor (Pf) and mean fragmentation size. To enhance the convenience of the model in practical applications, we developed an interactive Graphical User Interface (GUI), allowing users to input relevant variables and obtain instant prediction results.

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.ijhydene.2024.10.254
Predicting interfacial tension in brine-hydrogen/cushion gas systems under subsurface conditions: Implications for hydrogen geo-storage
  • Oct 23, 2024
  • International Journal of Hydrogen Energy
  • Mostafa Hosseini + 1 more

Predicting interfacial tension in brine-hydrogen/cushion gas systems under subsurface conditions: Implications for hydrogen geo-storage

  • Research Article
  • Cite Count Icon 1
  • 10.2196/70068
Predictive Models Using Machine Learning to Identify Fetal Growth Restriction in Patients With Preeclampsia: Development and Evaluation Study
  • May 27, 2025
  • Journal of Medical Internet Research
  • Qing Hua + 6 more

BackgroundFetal growth restriction (FGR) is a common complication of preeclampsia. FGR in patients with preeclampsia increases the risk of neonatal-perinatal mortality and morbidity. However, previous prediction methods for FGR are class-biased or clinically unexplainable, which makes it difficult to apply to clinical practice, leading to a relative delay in intervention and a lack of effective treatments.ObjectiveThe study aims to develop an auxiliary diagnostic model based on machine learning (ML) to predict the occurrence of FGR in patients with preeclampsia.MethodsThis study used a retrospective case-control approach to analyze 38 features, including the basic medical history and peripheral blood laboratory test results of pregnant patients with preeclampsia, either complicated or not complicated by FGR. ML models were constructed to evaluate the predictive value of maternal parameter changes on preeclampsia combined with FGR. Multiple algorithms were tested, including logistic regression, light gradient boosting, random forest (RF), extreme gradient boosting, multilayer perceptron, naive Bayes, and support vector machine. The model performance was identified by the area under the curve (AUC) and other evaluation indexes. The Shapley additive explanations (SHAP) method was adopted to rank the feature importance and explain the final model for clinical application.ResultsThe RF model performed best in discriminative ability among the 7 ML models. After reducing features according to importance rank, an explainable final RF model was established with 9 features, including urinary protein quantification, gestational week of delivery, umbilical artery systolic-to-diastolic ratio, amniotic fluid index, triglyceride, D-dimer, weight, height, and maximum systolic pressure. The model could accurately predict FGR for 513 patients with preeclampsia (149 with FGR and 364 without FGR) in the training and testing dataset (AUC 0.83, SD 0.03) using 5-fold cross-validation, which was closely validated for 103 patients with preeclampsia (n=45 with FGR and n=58 without FGR) in an external dataset (AUC 0.82, SD 0.048). On the whole, urinary protein quantification, umbilical artery systolic-to-diastolic ratio, and gestational week of delivery exhibited the highest contributions to the model performance (c=0.45, 0.34, and 0.33) based on SHAP analysis. For specific individual patients, SHAP results reveal the protective and risk factors to develop FGR for interpreting the model’s clinical significance. Finally, the model has been translated into a convenient web page tool to facilitate its use in clinical settings.ConclusionsThe study successfully developed a model that accurately predicts FGR development in patients with preeclampsia. The SHAP method captures highly relevant risk factors for model interpretation, alleviating concerns about the “black box” problem of ML techniques.

  • PDF Download Icon
  • Research Article
  • 10.59490/imdc.2024.888
Prediction of main engine power of oil tankers using artificial intelligence algorithms
  • May 23, 2024
  • International Marine Design Conference
  • Darin Majnarić + 3 more

In the preliminary ship design, the accurate determination of a vessel’s main engine power is one of the most critical aspects next to service speed, main particulars, and cargo capacity. However, this task can be quite intricate due to its reliance on an extremely great number of influencing factors. In the research that is presented in this paper dataset of 357 oil tankers was gathered and developed to research the idea in which genetic programming is applied to the mentioned dataset to obtain mathematical equations (MEs) that can estimate the ship’s main engine power with high accuracy. The highest estimation accuracy of MEs is achieved by tuning the GP hyperparameter values through the random hyperparameter search (RHS) method. The initial dataset was divided into train and test datasets in a 70:30 ratio. The train dataset was used to train GP in a 5-fold cross-validation process and after the process was done the obtained MEs were evaluated on the test dataset. To evaluate the GP training testing process several evaluation metrics were used i.e., coefficients of determination (R2), mean absolute error (MAE), root mean square error (RMSE), and length of obtained MEs. The conducted investigation showed that GP generated MEs that can estimate ship main engine power with high accuracy.

  • Research Article
  • Cite Count Icon 3
  • 10.3934/publichealth.2024034
Discriminating insulin resistance in middle-aged nondiabetic women using machine learning approaches.
  • Jan 1, 2024
  • AIMS public health
  • Zailing Xing + 2 more

We employed machine learning algorithms to discriminate insulin resistance (IR) in middle-aged nondiabetic women. The data was from the National Health and Nutrition Examination Survey (2007-2018). The study subjects were 2084 nondiabetic women aged 45-64. The analysis included 48 predictors. We randomly divided the data into training (n = 1667) and testing (n = 417) datasets. Four machine learning techniques were employed to discriminate IR: extreme gradient boosting (XGBoosting), random forest (RF), gradient boosting machine (GBM), and decision tree (DT). The area under the curve (AUC) of receiver operating characteristic (ROC), accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score were compared as performance metrics to select the optimal technique. The XGBoosting algorithm achieved a relatively high AUC of 0.93 in the training dataset and 0.86 in the testing dataset to discriminate IR using 48 predictors and was followed by the RF, GBM, and DT models. After selecting the top five predictors to build models, the XGBoost algorithm with the AUC of 0.90 (training dataset) and 0.86 (testing dataset) remained the optimal prediction model. The SHapley Additive exPlanations (SHAP) values revealed the associations between the five predictors and IR, namely BMI (strongly positive impact on IR), fasting glucose (strongly positive), HDL-C (medium negative), triglycerides (medium positive), and glycohemoglobin (medium positive). The threshold values for identifying IR were 29 kg/m2, 100 mg/dL, 54.5 mg/dL, 89 mg/dL, and 5.6% for BMI, glucose, HDL-C, triglycerides, and glycohemoglobin, respectively. The XGBoosting algorithm demonstrated superior performance metrics for discriminating IR in middle-aged nondiabetic women, with BMI, glucose, HDL-C, glycohemoglobin, and triglycerides as the top five predictors.

  • Research Article
  • Cite Count Icon 71
  • 10.1016/j.jobe.2020.101851
Soft computing techniques: Systematic multiscale models to predict the compressive strength of HVFA concrete based on mix proportions and curing times
  • Sep 28, 2020
  • Journal of Building Engineering
  • Ahmed Mohammed + 4 more

Soft computing techniques: Systematic multiscale models to predict the compressive strength of HVFA concrete based on mix proportions and curing times

  • Research Article
  • Cite Count Icon 12
  • 10.1016/j.ecoinf.2022.101959
Inclusion of fractal dimension in four machine learning algorithms improves the prediction accuracy of mean weight diameter of soil
  • Dec 17, 2022
  • Ecological Informatics
  • Abhradip Sarkar + 5 more

Inclusion of fractal dimension in four machine learning algorithms improves the prediction accuracy of mean weight diameter of soil

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.