Multi-output machine learning for predicting the mechanical properties of BFRC

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Multi-output machine learning for predicting the mechanical properties of BFRC

Similar Papers
  • Research Article
  • Cite Count Icon 9
  • 10.1002/suco.202300298
Data‐driven approach for investigating and predicting of compressive strength of fly ash–slag geopolymer concrete
  • Jun 20, 2023
  • Structural Concrete
  • Van Quan Tran

Fly ash–slag geopolymer concrete is an intangible material that does not use conventional Portland cement, thereby reducing CO2 emissions into the environment, and helping to increase sustainable development. However, compared with conventional concrete, the compressive strength of fly ash–slag geopolymer concrete is complexly dependent on many factors. Using the data‐driven approach for investigating and predicting fly ash–slag geopolymer concrete compressive strength is a suitable choice. This study introduces 11 easily accessible machine learning models in open‐source libraries of the Python programming language such as support vector machine, random forest (RF), gradient boosting (GB), AdaBoost, decision trees, light GB machine, extreme GB (XGB), K‐nearest neighbors, multivariable regression, Gaussian process regression, and CatBoost (CatB). Based on a dataset of 158 samples, 14 inputs, and 1 output variable compressive strength, the performance of 11 machine learning models was evaluated through 4 criteria including coefficient of determination, root mean square error, mean absolute error, and mean absolute percentage error combined with 10 repeats of 10‐fold cross‐validation. Four models have the best performance based on the above four criteria value in determining compressive strength for testing dataset sorted descending is CatB > XGB > RF > GB. Global Shapley (SHAP) value‐based CatB and XGB indicates three groups of factors with decreasing influence on compressive strength of geopolymer concrete: group I (slag, molarity, coarse aggregate, curing temperature, and alkaline activator/binder) > group II (Na2SiO3 content, NaOH content, fine aggregate, fly ash content), curing period > group III (extra water added, NaOH/Na2SiO3, superplasticizer content, rest period). Extra water added, NaOH/Na2SiO3, superplasticizer content, rest period have insignificant influence on the compressive strength value of geopolymer concrete. The greater the slag content in the slag–fly ash mixture, the greater the compressive strength of geopolymer concrete. The optimum molarity of NaOH concentration is about 14–16 M for designing the compressive strength of geopolymer concrete. SHAP values partial dependence plots (PDP) and PDP indicate that alkaline activator/binder optimal values exist to achieve high compressive strength. The compressive strength increases with curing temperature between 20 and 100°C. PDP values show that the tendency to increase compressive strength with increasing coarse aggregate content from about 750 to 1250 kg/m3.

  • Research Article
  • Cite Count Icon 2
  • 10.1038/s41598-025-99094-6
Machine learning-based quantification and separation of emissions and meteorological effects on PM2.5 in Greater Bangkok
  • Apr 28, 2025
  • Scientific Reports
  • Nishit Aman + 7 more

This study presents the first-ever application of machine learning (ML)-based meteorological normalization and Shapley additive explanations (SHAP) analysis to quantify, separate, and understand the effect of meteorology on PM2.5 over Greater Bangkok (GBK). Six ML models namely random forest (RF), adaptive boosting (ADB), gradient boosting (GB), extreme gradient boosting (XGB), light gradient boosting machine (LGBM), and cat boosting (CB) were used with meteorological factors, fire activity, land use, and socio-economic data as predictor variables. The LGBM outperformed other models achieving ρ = 0.9 (0.95), MBE = 0 (− 0.01), MAE = 5.5 (3.3) μg m−3, and RMSE = 8.7 (4.9) μg m−3 for hourly (daily) PM2.5 prediction. LGBM was used for spatiotemporal PM2.5 estimation, and meteorological normalization was applied to calculate PM2.5_emis (emission-related PM2.5) and PM2.5_met (meteorology-related PM2.5). Diurnal variation reveals higher PM2.5 levels in the morning (08–10 LT) due to increased traffic emissions and thermal inversion and a decrease in PM2.5 as the day progresses due to decreased emission and inversion dissipation. Monthly variation suggests higher PM2.5 in winter (December and January) due to emissions and stagnant meteorological conditions. Negative PM2.5_met during November, March, and April values show meteorology improves air quality, while positive values from December to February indicate stagnant winter conditions worsen it. During winter, PM2.5_emis and PM2.5 showed an increasing trend in 15.6% and 67.8% of the area while decreasing trends fell from 23.2 to 1.9%. In summer, the percentage of areas with an increasing trend rose from 18.7 to 34.6%, and decreasing areas fell from 12.6 to 6.5%. Increase in PM2.5 despite decreasing emission over a larger area, indicating limited effectiveness of mitigation measures. Winter exhibits greater PM2.5 variability due to episodic increases from changing meteorological conditions. In Bangkok and nearby areas, higher variability is mainly driven by meteorology, with more consistent emissions in Bangkok compared to rural areas affected by agricultural burning. PM2.5 and PM2.5_emis showed stronger persistence in winter than in summer, with weaker effects in Bangkok. Hurst exponent averages were 0.75, 0.76, and 0.72 for PM2.5 and 0.79, 0.8, and 0.73 for PM2.5_emis in dry, winter, and summer seasons, respectively. SHAP analysis suggested relative humidity, planetary boundary layer height, v wind, temperature, u wind, global radiation, and aerosol optical depth as the key variables affecting PM2.5 with mean absolute SHAP values of 5.29, 4.79, 4.29, 3.68, 2.37, 2.22, and 2.03, respectively. Based on these findings, some policy recommendations have been proposed.

  • Research Article
  • 10.1038/s41598-025-24107-3
Evaluating the mechanical behavior of plastic waste modified asphalt using optimized machine learning approaches
  • Nov 10, 2025
  • Scientific Reports
  • Tariq Alqubaysi + 6 more

The growing environmental challenges associated with plastic waste disposal and the need for sustainable pavement construction practices have prompted significant research interest in incorporating recycled plastics into asphalt mixtures. However, accurately predicting the performance characteristics of plastic-modified asphalt mixtures, particularly Marshall Stability (MS) and Marshall Flow (MF), remains a critical yet challenging task due to complex nonlinear relationships between mixture constituents. This study addresses this issue by developing reliable predictive models using machine learning techniques including Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGB), and Light Gradient Boosting Machine (LGBM), further optimized through Particle Swarm Optimization (PSO). A comprehensive dataset comprising 210 samples of plastic-modified asphalt mixtures was utilized, incorporating inputs such as plastic content and size, bitumen content, maximum aggregate size, mixing temperature, and compaction effort (number of blows), to predict MS and MF as outputs. Results showed that the PSO-optimized XGB model achieved the highest accuracy, yielding R2 values of 0.82 for MS and 0.83 for MF. Model interpretability was enhanced using advanced techniques such as SHapley Additive exPlanations (SHAP), Partial Dependence Plots (PDP), Individual Conditional Expectation (ICE) plots, and Taylor diagrams, quantitatively highlighting optimal plastic particle sizes (2.5–4 mm), bitumen content (5.3–5.5%) and plastic content (20–30%). These findings provide actionable insights that support safer and longer-lasting pavements, promote the sustainable reuse of waste plastics, and enable cost-effective mix design strategies for modern asphalt construction.

  • Research Article
  • 10.1186/s12889-025-23242-w
Application of machine learning algorithms to model predictors of informed contraceptive choice among reproductive age women in six high fertility rate sub Sahara Africa countries
  • May 29, 2025
  • BMC Public Health
  • Mequannent Sharew Melaku + 4 more

IntroductionInformed contraceptive choice is declared when a woman selects a methods of contraceptive after receiving comprehensive information on available alternatives, side effects, and management if adverse effect happens. Access to contraceptive information is a fundamental right, crucial for reducing fertility and unintended pregnancies and related complications. Despite efforts to reduce fertility, Sub-Saharan Africa region is still accounts for over half of the global births due to low contraceptive use, high discontinuation rate, and unmet needs, often linked to uninformed contraceptive choice. While studies on informed contraceptive choice are available using classical regression analysis, the diverse nature of factors have not been systematically analyzed using machine learning algorithms. Hence, this study aimed to apply machine learning algorithms to model predictors of informed contraceptive choices among reproductive age women in six high fertility rate Sub Sahara Africa countries.MethodsThis study used 11,706 weighted women aggregated from 6 high fertility rate countries in Sub Saharan Africa including Mali, Angola, Burundi, Nigeria, Gambia, and Burkina Faso, collected using stratified sampling techniques. Data cleaning, weighting, and descriptive statistical analyses were conducted using STATA version 17 and Excel 2019, while machine learning analysis was performed using Python 3.12. Furthermore, Random Forest, eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), Naïve Bayes, Decision Tree, Logistic Regression, and Adaptive Boosting (AdaBoost) were employed to predict informed contraceptive choice and to identify its predictors. Shapley Additive Explanations (SHAP) was used to assess the link between predictors and informed contraceptive choice. Accuracy and area under the curve (AUC), along with precision, recall, and F1 score, were used to evaluate the performance of the predictive models.ResultsAbout 58% women receive informed choice of contraceptive methods, ranges 29% in Burundi to 77% in Burkina Faso. Moreover, the highest spatial clustering of informed choice of contraceptive methods cases was observed in Burkina Faso while the lowest is clustering was found in Angola. LGBM model achieved an accuracy of 73%, area under the curve (AUC) of 0.80, precision of 71, and recall of 77. The SHAP analysis revealed that health facility visits within 12 months, religion, source of contraceptive, exposure to family planning message, mobile ownership, education, wealth index, under five children, residence, and total life time partner were the top ten predictors of informed contraceptive choice.ConclusionNearly six out of ten women received informed contraceptive choice, the magnitude is highest in Burkina Faso and lowest in Mali. Moreover, the highest spatial clustering of informed choice of contraceptive was observed in Burkina Faso while the lowest clustering was found in Angola. The LGBM classifier outperformed among machine learning algorithms and achieved 73% accuracy and an AUC of 0.80. Key factors influencing informed contraceptive choice were health facility visits, religion, contraceptive source, family planning messages, mobile ownership, education, wealth, residence, and lifetime partners. To enhance informed contraceptive choice, governments and policymakers should strengthen family planning education, expand healthcare services, and ensure equitable access to contraceptive information. Digital health solutions, especially mobile-based platforms, can also bridge information gaps. Integrating counseling into routine healthcare, training providers, and expanding mass media campaigns can enhance awareness. Engaging communities can help overcome social and religious barriers. Continuous monitoring and data-driven policy adjustments are essential for responsive interventions that address the evolving reproductive health needs in sub-Saharan Africa. Finally, we recommend that future research validate these findings using external data sources.

  • Research Article
  • Cite Count Icon 2
  • 10.1177/20552076241272739
Interpretable prediction of acute respiratory infection disease among under-five children in Ethiopia using ensemble machine learning and Shapley additive explanations (SHAP).
  • Jan 1, 2024
  • Digital health
  • Zinabu Bekele Tadese + 6 more

Although the prevalence of childhood illnesses has significantly decreased, acute respiratory infections continue to be the leading cause of death and disease among children in low- and middle-income countries. Seven percent of children under five experienced symptoms in the two weeks preceding the Ethiopian demographic and health survey. Hence, this study aimed to identify interpretable predicting factors of acute respiratory infection disease among under-five children in Ethiopia using machine learning analysis techniques. Secondary data analysis was performed using 2016 Ethiopian demographic and health survey data. Data were extracted using STATA and imported into Jupyter Notebook for further analysis. The presence of acute respiratory infection in a child under the age of 5 was the outcome variable, categorized as yes and no. Five ensemble boosting machine learning algorithms such as adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), Gradient Boost, CatBoost, and light gradient-boosting machine (LightGBM) were employed on a total sample of 10,641 children under the age of 5. The Shapley additive explanations technique was used to identify the important features and effects of each feature driving the prediction. The XGBoost model achieved an accuracy of 79.3%, an F1 score of 78.4%, a recall of 78.3%, a precision of 81.7%, and a receiver operating curve area under the curve of 86.1% after model optimization. Child age (month), history of diarrhea, number of living children, duration of breastfeeding, and mother's occupation were the top predicting factors of acute respiratory infection among children under the age of 5 in Ethiopia. The XGBoost classifier was the best predictive model with improved performance, and predicting factors of acute respiratory infection were identified with the help of the Shapely additive explanation. The findings of this study can help policymakers and stakeholders understand the decision-making process for acute respiratory infection prevention among under-five children in Ethiopia.

  • Research Article
  • 10.3389/fpubh.2025.1628493
Predicting fall risk among older adults in Chinese communities with advanced machine learning techniques: a retrospective study
  • Sep 1, 2025
  • Frontiers in Public Health
  • Aihong Liu + 3 more

BackgroundThis study aims to develop a advanced machine learning model to predict the fall risk among community-dwelling elders. This study could present actionable advices for early prevention of fall risk.MethodsBetween October and December 2022, 977 older adults from the Hannan District of Wuhan were recruited. Data was collected using structured questionnaires. The sample was randomly split into training (732 participants) and testing (245 participants) sets at a 3:1 ratio. The primary outcome was the occurrence of fall. Five machine learning models—Random Forest (RF), Gradient Boosted Decision Tree (GBDT), Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGBoost), and Categorical Features Gradient Boosting (CatBoost)—were evaluated against a Logistic Regression (LR) model. Model performance was assessed using AUC, accuracy, precision, sensitivity, specificity, and F1 score.ResultsAmong the 977 older adults, 195 experienced falls (20.0%). ROC curve analysis showed AUC values of LR, RF, LGBM, GBDT, XGBoost, and CatBoost were, respectively, 0.8390, 0.8632, 0.8614, 0.8544, 0.8705, and 0.8719. CatBoost had the highest AUC, indicating the best predictive performance. SHapley Additive exPlanations (SHAP) analysis identified key features influencing the CatBoost model: history of falls, comorbidities, polypharmacy, sleep disorders, ADL, TUG results, frailty status, and use of assistive devices.ConclusionThe fall risk prediction model for community-dwelling older adults, developed with CatBoost, showed excellent performance and can aid in early clinical assessment and fall prevention.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 35
  • 10.3390/rs15102659
Assessment of Wildfire Susceptibility and Wildfire Threats to Ecological Environment and Urban Development Based on GIS and Multi-Source Data: A Case Study of Guilin, China
  • May 19, 2023
  • Remote Sensing
  • Weiting Yue + 6 more

The frequent occurrence and spread of wildfires pose a serious threat to the ecological environment and urban development. Therefore, assessing regional wildfire susceptibility is crucial for the early prevention of wildfires and formulation of disaster management decisions. However, current research on wildfire susceptibility primarily focuses on improving the accuracy of models, while lacking in-depth study of the causes and mechanisms of wildfires, as well as the impact and losses they cause to the ecological environment and urban development. This situation not only increases the uncertainty of model predictions but also greatly reduces the specificity and practical significance of the models. We propose a comprehensive evaluation framework to analyze the spatial distribution of wildfire susceptibility and the effects of influencing factors, while assessing the risks of wildfire damage to the local ecological environment and urban development. In this study, we used wildfire information from the period 2013–2022 and data from 17 susceptibility factors in the city of Guilin as the basis, and utilized eight machine learning algorithms, namely logistic regression (LR), artificial neural network (ANN), K-nearest neighbor (KNN), support vector regression (SVR), random forest (RF), gradient boosting decision tree (GBDT), light gradient boosting machine (LGBM), and eXtreme gradient boosting (XGBoost), to assess wildfire susceptibility. By evaluating multiple indicators, we obtained the optimal model and used the Shapley Additive Explanations (SHAP) method to explain the effects of the factors and the decision-making mechanism of the model. In addition, we collected and calculated corresponding indicators, with the Remote Sensing Ecological Index (RSEI) representing ecological vulnerability and the Night-Time Lights Index (NTLI) representing urban development vulnerability. The coupling results of the two represent the comprehensive vulnerability of the ecology and city. Finally, by integrating wildfire susceptibility and vulnerability information, we assessed the risk of wildfire disasters in Guilin to reveal the overall distribution characteristics of wildfire disaster risk in Guilin. The results show that the AUC values of the eight models range from 0.809 to 0.927, with accuracy values ranging from 0.735 to 0.863 and RMSE values ranging from 0.327 to 0.423. Taking into account all the performance indicators, the XGBoost model provides the best results, with AUC, accuracy, and RMSE values of 0.927, 0.863, and 0.327, respectively. This indicates that the XGBoost model has the best predictive performance. The high-susceptibility areas are located in the central, northeast, south, and southwest regions of the study area. The factors of temperature, soil type, land use, distance to roads, and slope have the most significant impact on wildfire susceptibility. Based on the results of the ecological vulnerability and urban development vulnerability assessments, potential wildfire risk areas can be identified and assessed comprehensively and reasonably. The research results of this article not only can improve the specificity and practical significance of wildfire prediction models but also provide important reference for the prevention and response of wildfires.

  • Research Article
  • Cite Count Icon 8
  • 10.1016/j.engstruct.2023.116236
Data-driven models for predicting tensile load capacity and failure mode of grouted splice sleeve connection
  • May 18, 2023
  • Engineering Structures
  • Gao Ma + 3 more

Data-driven models for predicting tensile load capacity and failure mode of grouted splice sleeve connection

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.3390/diagnostics13172735
Machine Learning Models for Prediction of Severe Pneumocystis carinii Pneumonia after Kidney Transplantation: A Single-Center Retrospective Study.
  • Aug 23, 2023
  • Diagnostics
  • Yiting Liu + 7 more

The objective of this study was to formulate and validate a prognostic model for postoperative severe Pneumocystis carinii pneumonia (SPCP) in kidney transplant recipients utilizing machine learning algorithms, and to compare the performance of various models. Clinical manifestations and laboratory test results upon admission were gathered as variables for 88 patients who experienced PCP following kidney transplantation. The most discriminative variables were identified, and subsequently, Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), K-Nearest Neighbor (KNN), Light Gradient Boosting Machine (LGBM), and eXtreme Gradient Boosting (XGB) models were constructed. Finally, the models' predictive capabilities were assessed through ROC curves, sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), and F1-scores. The Shapley additive explanations (SHAP) algorithm was employed to elucidate the contributions of the most effective model's variables. Through lasso regression, five features-hemoglobin (Hb), Procalcitonin (PCT), C-reactive protein (CRP), progressive dyspnea, and Albumin (ALB)-were identified, and six machine learning models were developed using these variables after evaluating their correlation and multicollinearity. In the validation cohort, the RF model demonstrated the highest AUC (0.920 (0.810-1.000), F1-Score (0.8), accuracy (0.885), sensitivity (0.818), PPV (0.667), and NPV (0.913) among the six models, while the XGB and KNN models exhibited the highest specificity (0.909) among the six models. Notably, CRP exerted a significant influence on the models, as revealed by SHAP and feature importance rankings. Machine learning algorithms offer a viable approach for constructing prognostic models to predict the development of severe disease following PCP in kidney transplant recipients, with potential practical applications.

  • Research Article
  • Cite Count Icon 26
  • 10.1016/j.conbuildmat.2023.131604
Data-driven shear strength predictions of recycled aggregate concrete beams with /without shear reinforcement by applying machine learning approaches
  • May 10, 2023
  • Construction and Building Materials
  • Thushara Jayasinghe + 7 more

Data-driven shear strength predictions of recycled aggregate concrete beams with /without shear reinforcement by applying machine learning approaches

  • Research Article
  • 10.1038/s41598-025-13926-z
Comparative performance evaluation of machine learning models for predicting the ultimate bearing capacity of shallow foundations on granular soils
  • Oct 21, 2025
  • Scientific Reports
  • Jalal Shah + 4 more

Accurate estimation of the ultimate bearing capacity (UBC) of shallow foundations is critical for safe and economical geotechnical design. Traditional approaches depend heavily on extensive and costly field and laboratory investigations, while numerical simulations, though effective, are computationally intensive and time-consuming. To address these limitations, this study investigates the application of machine learning (ML) models for efficient and reliable prediction of the ultimate bearing capacity of shallow foundations. Although numerous studies have explored individual ML techniques for this purpose, a comprehensive and consistent comparison of widely used models under identical conditions remains limited. This research evaluates six ML algorithms; k-Nearest Neighbors (kNN), Artificial Neural Network (NN), Random Forest (RF), Extreme Gradient Boosting (xGBoost), Adaptive Boosting (AdaBoost), and Stochastic Gradient Descent (SGD), using a dataset of 169 experimental results collected from literature. The input features include foundation width (B), depth (D), length-to-width ratio (L/B), soil unit weight (γ), and angle of internal friction (φ). Model performance was assessed using multiple evaluation metrics: coefficient of determination (R²), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE), Mean Squared Error (MSE), and objective function (OBJ). To enhance model interpretability, SHapley Additive Explanations (SHAP) and Partial Dependence Plots (PDPs) were employed to analyze feature importance and input-output relationships, highlighting the influence of both soil properties and foundation geometry on predicted bearing capacity. Among the evaluated models, AdaBoost demonstrated the best overall performance, achieving R² values of 0.939 and 0.881 on the training and testing sets, respectively. Based on the cumulative ranking of the models across all evaluation metrics, the models were ranked in the following order of performance: AdaBoost > kNN > RF > xGBoost > NN > SGD. While the results are promising, a key limitation is the use of single-layer soil data, which restricts applicability to more complex, multilayered soil profiles. Future studies should incorporate multilayer datasets and account for spatial variability to enhance the generalizability and robustness of predictive models.

  • Research Article
  • Cite Count Icon 3
  • 10.1038/s41598-025-10990-3
Hydraulic Performance Modeling of Inclined Double Cutoff Walls Beneath Hydraulic Structures Using Optimized Ensemble Machine Learning.
  • Jul 29, 2025
  • Scientific reports
  • Mohamed Kamel Elshaarawy + 2 more

This study investigates the effectiveness of inclined double cutoff walls installed beneath hydraulic structures by employing five machine learning models: Random Forest(RF), Adaptive Boosting(AdaBoost), eXtreme Gradient Boosting(XGBoost), Light Gradient Boosting Machine(LightGBM), and Categorical Boosting (CatBoost). A comprehensive dataset of 630 samples was gathered from previous studies, including key input variables such as the relative distance between the cutoff wall and the structure's apron width (L/B), the inclination angle ratio between downstream and upstream cutoffs (θ2/θ1), the depth ratio of downstream to upstream cutoff walls (d2/d1), and the relative downstream cutoff depth to the permeable layer depth (d2/D). Outputs considered were the relative uplift force (U/Uo), the relative exit hydraulic gradient (iR/iRo), and the relative seepage discharge per unit structure length (q/qo). The dataset was split with a 70:30 ratio for training and testing. Hyperparameter optimization was conducted using Bayesian Optimization (BO) coupled with five-fold cross-validation to enhance model performance. Results showed that the CatBoost model demonstrated superior performance over other models, consistently yielding high R2 values, specifically surpassing 0.95, 0.93, and 0.97 for U/Uo, iR/iRo, and q/qo, respectively, along with low RMSE scores below 0.022, 0.089, and 0.019 for the same variables. A feature importance analysis is conducted using SHapley Additive exPlanations(SHAP) and Partial Dependence Plot (PDP). The analysis revealed that L/B was the most influential predictor for U/Uo and iR/iRo, while d2/D played a crucial role in determining q/qo. Moreover, PDPs illustrated a positive linear relationship between L/B and U/Uo, a V-shaped impact of d2/d1 on iR/iRo and q/qo, and complex nonlinear interactions for θ2/θ1 across all target variables. Furthermore, an interactive Graphical User Interface(GUI) was developed, enabling engineers to efficiently predict output variables and apply model insights in practical scenarios.

  • Research Article
  • Cite Count Icon 4
  • 10.1371/journal.pone.0300201
Machine learning-based models to predict the conversion of normal blood pressure to hypertension within 5-year follow-up.
  • Mar 14, 2024
  • PLOS ONE
  • Aref Andishgar + 6 more

Factors contributing to the development of hypertension exhibit significant variations across countries and regions. Our objective was to predict individuals at risk of developing hypertension within a 5-year period in a rural Middle Eastern area. This longitudinal study utilized data from the Fasa Adults Cohort Study (FACS). The study initially included 10,118 participants aged 35-70 years in rural districts of Fasa, Iran, with a follow-up of 3,000 participants after 5 years using random sampling. A total of 160 variables were included in the machine learning (ML) models, and feature scaling and one-hot encoding were employed for data processing. Ten supervised ML algorithms were utilized, namely logistic regression (LR), support vector machine (SVM), random forest (RF), Gaussian naive Bayes (GNB), linear discriminant analysis (LDA), k-nearest neighbors (KNN), gradient boosting machine (GBM), extreme gradient boosting (XGB), cat boost (CAT), and light gradient boosting machine (LGBM). Hyperparameter tuning was performed using various combinations of hyperparameters to identify the optimal model. Synthetic Minority Over-sampling Technology (SMOTE) was used to balance the training data, and feature selection was conducted using SHapley Additive exPlanations (SHAP). Out of 2,288 participants who met the criteria, 251 individuals (10.9%) were diagnosed with new hypertension. The LGBM model (determined to be the optimal model) with the top 30 features achieved an AUC of 0.67, an f1-score of 0.23, and an AUC-PR of 0.26. The top three predictors of hypertension were baseline systolic blood pressure (SBP), gender, and waist-to-hip ratio (WHR), with AUCs of 0.66, 0.58, and 0.63, respectively. Hematuria in urine tests and family history of hypertension ranked fourth and fifth. ML models have the potential to be valuable decision-making tools in evaluating the need for early lifestyle modification or medical intervention in individuals at risk of developing hypertension.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.istruc.2024.106193
Data driven models for capacity prediction of CFS lipped channel flexural members
  • Mar 22, 2024
  • Structures
  • V.M Sreedevi + 8 more

Data driven models for capacity prediction of CFS lipped channel flexural members

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.mtcomm.2024.108173
Data-driven shear strength prediction of steel reinforced concrete composite shear wall
  • Jan 23, 2024
  • Materials Today Communications
  • Peng Huang + 2 more

Data-driven shear strength prediction of steel reinforced concrete composite shear wall

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.