Developing Explainable Machine Learning Model using Augmented Concept Activation Vector
Machine learning models use high-dimensional feature spaces to map their inputs to the corresponding class labels. However, these features often do not have a one-to-one correspondence with physical concepts understandable by humans, which hinders the ability to provide a meaningful explanation for the decisions made by these models. We propose a method for measuring the correlation between high-level concepts and the decisions made by a machine learning model. Our method can isolate the impact of a given high-level concept and accurately measure it quantitatively. Additionally, this study aims to determine the prevalence of frequent patterns in machine learning models, which often occur in imbalanced datasets. We have successfully applied the proposed method to fundus images and managed to quantitatively measure the impact of radiomic patterns on the model’s decisions.
- Research Article
26
- 10.2196/44081
- May 31, 2023
- Journal of Medical Internet Research
BackgroundLow birthweight (LBW) is a leading cause of neonatal mortality in the United States and a major causative factor of adverse health effects in newborns. Identifying high-risk patients early in prenatal care is crucial to preventing adverse outcomes. Previous studies have proposed various machine learning (ML) models for LBW prediction task, but they were limited by small and imbalanced data sets. Some authors attempted to address this through different data rebalancing methods. However, most of their reported performances did not reflect the models’ actual performance in real-life scenarios. To date, few studies have successfully benchmarked the performance of ML models in maternal health; thus, it is critical to establish benchmarks to advance ML use to subsequently improve birth outcomes.ObjectiveThis study aimed to establish several key benchmarking ML models to predict LBW and systematically apply different rebalancing optimization methods to a large-scale and extremely imbalanced all-payer hospital record data set that connects mother and baby data at a state level in the United States. We also performed feature importance analysis to identify the most contributing features in the LBW classification task, which can aid in targeted intervention.MethodsOur large data set consisted of 266,687 birth records across 6 years, and 8.63% (n=23,019) of records were labeled as LBW. To set up benchmarking ML models to predict LBW, we applied 7 classic ML models (ie, logistic regression, naive Bayes, random forest, extreme gradient boosting, adaptive boosting, multilayer perceptron, and sequential artificial neural network) while using 4 different data rebalancing methods: random undersampling, random oversampling, synthetic minority oversampling technique, and weight rebalancing. Owing to ethical considerations, in addition to ML evaluation metrics, we primarily used recall to evaluate model performance, indicating the number of correctly predicted LBW cases out of all actual LBW cases, as false negative health care outcomes could be fatal. We further analyzed feature importance to explore the degree to which each feature contributed to ML model prediction among our best-performing models.ResultsWe found that extreme gradient boosting achieved the highest recall score—0.70—using the weight rebalancing method. Our results showed that various data rebalancing methods improved the prediction performance of the LBW group substantially. From the feature importance analysis, maternal race, age, payment source, sum of predelivery emergency department and inpatient hospitalizations, predelivery disease profile, and different social vulnerability index components were important risk factors associated with LBW.ConclusionsOur findings establish useful ML benchmarks to improve birth outcomes in the maternal health domain. They are informative to identify the minority class (ie, LBW) based on an extremely imbalanced data set, which may guide the development of personalized LBW early prevention, clinical interventions, and statewide maternal and infant health policy changes.
- Research Article
- 10.1080/20964471.2025.2518763
- Jun 29, 2025
- Big Earth Data
Despite the widespread use of machine learning (ML) models for geospatial applications, adaptations to imbalanced multitemporal land cover (LC) datasets remain underexplored. For over two decades, studies have predominantly trained ML models on a single interval of LC data to model changes, with detriments of imbalanced training datasets managed through manual manipulations. Therefore, this study proposes and implements an ML-spatial sample weighting (ML-SSW) approach to leverage available multitemporal LC data while adjusting sample influence to reflect recency of change occurrence and class-level spatial pattern measures to enable data-driven LC change modeling. Random Forest (RF), Neural Network (NN), and Extreme Gradient Boosting Machine (XGB) models are trained under the ML-SSW strategy on three study areas located in British Columbia, Canada. The RF-SSW, NN-SSW, and XGB-SSW models forecasted more realistic changes across multiple timesteps with fewer errors than baseline configurations. The presented methodology provides a step toward establishing spatialized cost-sensitive learning strategies and extending classical ML models to multitemporal LC datasets.
- Research Article
11
- 10.1016/j.rineng.2024.103233
- Oct 24, 2024
- Results in Engineering
Data augmentation using SMOTE technique: Application for prediction of burst pressure of hydrocarbons pipeline using supervised machine learning models
- Research Article
2
- 10.1038/s41598-022-20012-1
- Sep 30, 2022
- Scientific Reports
Deep neural networks (DNNs) have shown success in image classification, with high accuracy in recognition of everyday objects. Performance of DNNs has traditionally been measured assuming human accuracy is perfect. In specific problem domains, however, human accuracy is less than perfect and a comparison between humans and machine learning (ML) models can be performed. In recognising everyday objects, humans have the advantage of a lifetime of experience, whereas DNN models are trained only with a limited image dataset. We have tried to compare performance of human learners and two DNN models on an image dataset which is novel to both, i.e. histological images. We thus aim to eliminate the advantage of prior experience that humans have over DNN models in image classification. Ten classes of tissues were randomly selected from the undergraduate first year histology curriculum of a Medical School in North India. Two machine learning (ML) models were developed based on the VGG16 (VML) and Inception V2 (IML) DNNs, using transfer learning, to produce a 10-class classifier. One thousand (1000) images belonging to the ten classes (i.e. 100 images from each class) were split into training (700) and validation (300) sets. After training, the VML and IML model achieved 85.67 and 89% accuracy on the validation set, respectively. The training set was also circulated to medical students (MS) of the college for a week. An online quiz, consisting of a random selection of 100 images from the validation set, was conducted on students (after obtaining informed consent) who volunteered for the study. 66 students participated in the quiz, providing 6557 responses. In addition, we prepared a set of 10 images which belonged to different classes of tissue, not present in training set (i.e. out of training scope or OTS images). A second quiz was conducted on medical students with OTS images, and the ML models were also run on these OTS images. The overall accuracy of MS in the first quiz was 55.14%. The two ML models were also run on the first quiz questionnaire, producing accuracy between 91 and 93%. The ML models scored more than 80% of medical students. Analysis of confusion matrices of both ML models and all medical students showed dissimilar error profiles. However, when comparing the subset of students who achieved similar accuracy as the ML models, the error profile was also similar. Recognition of ‘stomach’ proved difficult for both humans and ML models. In 04 images in the first quiz set, both VML model and medical students produced highly equivocal responses. Within these images, a pattern of bias was uncovered–the tendency of medical students to misclassify ‘liver’ tissue. The ‘stomach’ class proved most difficult for both MS and VML, producing 34.84% of all errors of MS, and 41.17% of all errors of VML model; however, the IML model committed most errors in recognising the ‘skin’ class (27.5% of all errors). Analysis of the convolution layers of the DNN outlined features in the original image which might have led to misclassification by the VML model. In OTS images, however, the medical students produced better overall score than both ML models, i.e. they successfully recognised patterns of similarity between tissues and could generalise their training to a novel dataset. Our findings suggest that within the scope of training, ML models perform better than 80% medical students with a distinct error profile. However, students who have reached accuracy close to the ML models, tend to replicate the error profile as that of the ML models. This suggests a degree of similarity between how machines and humans extract features from an image. If asked to recognise images outside the scope of training, humans perform better at recognising patterns and likeness between tissues. This suggests that ‘training’ is not the same as ‘learning’, and humans can extend their pattern-based learning to different domains outside of the training set.
- Research Article
1
- 10.3390/a18100599
- Sep 25, 2025
- Algorithms
Phishing emails remain a significant concern and a growing cybersecurity threat in online communication. They often bypass traditional filters due to their increasing sophistication. This study presents a comparative evaluation of machine learning (ML) models and transformer-based large language models (LLMs) for phishing email detection, with embedded URL analysis. This study assessed ML training and LLM fine-tuning on both balanced and imbalanced datasets. We evaluated multiple ML models, including Random Forest, Logistic Regression, Support Vector Machine, Naïve Bayes, Gradient Boosting, Decision Tree, and K-Nearest Neighbors, alongside transformer-based LLMs DistilBERT, ALBERT, BERT-Tiny, ELECTRA, MiniLM, and RoBERTa. To further enhance realism, phishing emails generated by LLMs were included in the evaluation. Across all configurations, both the ML models and the fine-tuned LLMs demonstrated robust performance. Random Forest achieved over 98% accuracy in both email detection and URL classification. DistilBERT obtained almost as high scores on emails and URLs. Balancing the dataset led to slight accuracy gains in ML models but minor decreases in LLMs, likely due to their sensitivity to majority class reductions during training. Overall, LLMs are highly effective at capturing complex language patterns, while traditional ML models remain efficient and require low computational resources. Combining both approaches through a hybrid or ensemble method could enhance phishing detection effectiveness.
- Research Article
43
- 10.3390/app112110004
- Oct 26, 2021
- Applied Sciences
The problem of imbalanced datasets is a significant concern when creating reliable credit card fraud (CCF) detection systems. In this work, we study and evaluate recent advances in machine learning (ML) algorithms and deep reinforcement learning (DRL) used for CCF detection systems, including fraud and non-fraud labels. Based on two resampling approaches, SMOTE and ADASYN are used to resample the imbalanced CCF dataset. ML algorithms are, then, applied to this balanced dataset to establish CCF detection systems. Next, DRL is employed to create detection systems based on the imbalanced CCF dataset. The diverse classification metrics are indicated to thoroughly evaluate the performance of these ML and DRL models. Through empirical experiments, we identify the reliable degree of ML models based on two resampling approaches and DRL models for CCF detection. When SMOTE and ADASYN are used to resampling original CCF datasets before training/test split, the ML models show very high outcomes of above 99% accuracy. However, when these techniques are employed to resample for only the training CCF datasets, these ML models show lower results, particularly in terms of logistic regression with 1.81% precision and 3.55% F1 score for using ADASYN. Our work reveals the DRL model is ineffective and achieves low performance, with only 34.8% accuracy.
- Research Article
7
- 10.1200/cci.23.00264
- Apr 1, 2024
- JCO clinical cancer informatics
Adverse effects of chemotherapy often require hospital admissions or treatment management. Identifying factors contributing to unplanned hospital utilization may improve health care quality and patients' well-being. This study aimed to assess if patient-reported outcome measures (PROMs) improve performance of machine learning (ML) models predicting hospital admissions, triage events (contacting helpline or attending hospital), and changes to chemotherapy. Clinical trial data were used and contained responses to three PROMs (European Organisation for Research and Treatment of Cancer Core Quality of Life Questionnaire [QLQ-C30], EuroQol Five-Dimensional Visual Analogue Scale [EQ-5D], and Functional Assessment of Cancer Therapy-General [FACT-G]) and clinical information on 508 participants undergoing chemotherapy. Six feature sets (with following variables: [1] all available; [2] clinical; [3] PROMs; [4] clinical and QLQ-C30; [5] clinical and EQ-5D; [6] clinical and FACT-G) were applied in six ML models (logistic regression [LR], decision tree, adaptive boosting, random forest [RF], support vector machines [SVMs], and neural network) to predict admissions, triage events, and chemotherapy changes. The comprehensive analysis of predictive performances of the six ML models for each feature set in three different methods for handling class imbalance indicated that PROMs improved predictions of all outcomes. RF and SVMs had the highest performance for predicting admissions and changes to chemotherapy in balanced data sets, and LR in imbalanced data set. Balancing data led to the best performance compared with imbalanced data set or data set with balanced train set only. These results endorsed the view that ML can be applied on PROM data to predict hospital utilization and chemotherapy management. If further explored, this study may contribute to health care planning and treatment personalization. Rigorous comparison of model performance affected by different imbalanced data handling methods shows best practice in ML research.
- Research Article
- 10.31645/jisrc.25.23.2.9
- Jan 1, 2025
- Journal of Independent Studies and Research Computing
Emotion recognition from textual data has become increasingly vital in domains such as sentiment-aware systems, conversational agents, and mental health analysis. Despite significant progress, accurately detecting emotions from text remains a challenging task due to the lack of prosodic and visual cues, contextual ambiguity, and imbalanced datasets. This study presents a comprehensive evaluation of traditional Machine Learning (ML) and advanced Deep Learning (DL) models on four diverse emotion-labeled datasets: DailyDialog, ISEAR, Emotion-Stimulus, and CrowdFlower. Various feature extraction techniques—TF-IDF and Count Vectorizer for ML models, and semantic embeddings (Word2Vec and GloVe) for DL models—were employed to assess their impact on model performance. The models compared include Logistic Regression, Random Forest, Stochastic Gradient Descent, and Multinomial Naïve Bayes for ML, and LSTM, BiLSTM, and CNN for DL. Evaluation metrics such as accuracy, precision, recall, F1-score, and MCC were used for performance assessment. Results reveal that DL models, particularly CNN and BiLSTM, outperform ML models in terms of accuracy and contextual understanding, especially on structured datasets. Conversely, Logistic Regression with TF-IDF demonstrates robustness on noisy and imbalanced data. Word2Vec embeddings consistently enhance DL model performance, highlighting the importance of contextual semantics. This work underscores the significance of dataset characteristics, model architecture, and feature representation in achieving effective emotion classification. Future directions include integrating transformer-based models, addressing class imbalance, and exploring multimodal emotion recognition to improve generalization and real-world applicability.
- Preprint Article
- 10.5194/egusphere-egu23-11636
- May 15, 2023
For recent years, Machine Learning (ML) models have been proven to be useful in solving problems of a wide variety of fields such as medical, economic, manufacturing, transportation, energy, education, etc. With increased interest in ML models and advances in sensor technologies, ML models are being widely applied even in civil engineering domain. ML model enables analysis of large amounts of data, automation, improved decision making and provides more accurate prediction. While several state-of-the-art reviews have been conducted in each sub-domain (e.g., geotechnical engineering, structural engineering) of civil engineering or its specific application problems (e.g., structural damage detection, water quality evaluation), little effort has been devoted to comprehensive review on ML models applied in civil engineering and compare them across sub-domains. A systematic, but domain-specific literature review framework should be employed to effectively classify and compare the models. To that end, this study proposes a novel review approach based on the hierarchical classification tree “D-A-M-I-E (Domain-Application problem-ML models-Input data-Example case)”. “D-A-M-I-E” classification tree classifies the ML studies in civil engineering based on the (1) domain of the civil engineering, (2) application problem, (3) applied ML models and (4) data used in the problem. Moreover, data used for the ML models in each application examples are examined based on the specific characteristic of the domain and the application problem. For comprehensive review, five different domains (structural engineering, geotechnical engineering, water engineering, transportation engineering and energy engineering) are considered and the ML application problem is divided into five different problems (prediction, classification, detection, generation, optimization). Based on the “D-A-M-I-E” classification tree, about 300 ML studies in civil engineering are reviewed. For each domain, analysis and comparison on following questions has been conducted: (1) which problems are mainly solved based on ML models, (2) which ML models are mainly applied in each domain and problem, (3) how advanced the ML models are and (4) what kind of data are used and what processing of data is performed for application of ML models. This paper assessed the expansion and applicability of the proposed methodology to other areas (e.g., Earth system modeling, climate science). Furthermore, based on the identification of research gaps of ML models in each domain, this paper provides future direction of ML in civil engineering based on the approaches of dealing data (e.g., collection, handling, storage, and transmission) and hopes to help application of ML models in other fields.
- Research Article
24
- 10.1175/jcli-d-21-0113.1
- Jun 8, 2021
- Journal of Climate
In this study, four machine learning (ML) models (gradient boost decision tree (GBDT), light gradient boosting machine (LightGBM), categorical boosting (CatBoost) and extreme gradient boosting (XGBoost)) are used to perform seasonal forecasts for non-monsoonal winter precipitation over the Eurasian continent (30-60°N, 30-105°E) (NWPE). The seasonal forecast results from a traditional linear regression (LR) model and two dynamic models are compared. The ML and LR models are trained using the data for the period of 1979-2010, and then, these empirical models are used to perform the seasonal forecast of NWPE for 2011-2018. Our results show that the four ML models have reasonable seasonal forecast skills for the NWPE and clearly outperform the LR model. The ML models and the dynamic models have skillful forecasts for the NWPE over different regions. The ensemble means of the forecasts including the ML models and dynamic models show higher forecast skill for the NWEP than the ensemble mean of the dynamic-only models. The forecast skill of the ML models mainly benefits from a skillful forecast of the third empirical orthogonal function (EOF) mode (EOF3) of the NWPE, which has a good and consistent prediction among the ML models. Our results also illustrate that the sea ice over the Arctic in the previous autumn is the most important predictor in the ML models in forecasting the NWPE. This study suggests that ML models may be useful tools to help improve seasonal forecasts of the NWPE.
- Research Article
- 10.12122/j.issn.1673-4254.2026.01.15
- Jan 20, 2026
- Nan fang yi ke da xue xue bao = Journal of Southern Medical University
To improve the accuracy of machine learning models for preoperative prediction of high-intensity focused ultrasound (HIFU) ablation efficacy for uterine fibroids by correcting class imbalance in small sample datasets using undersampling methods. Clinical and imaging data were collected from 140 patients with uterine fibroids undergoing HIFU treatment at Foshan Women and Children Hospital, including 104 with high ablation rates and 36 with low ablation rates. Radiomic features were extracted from MRI T2-weighted images (T2WI) of the patients, and machine learning models were constructed to predict HIFU treatment outcomes. Four machine learning algorithms, including k-Nearest Neighbors (KNN), Random Forest (RF), Support Vector Machine (SVM), and Multilayer Perceptron (MLP), were coupled with 7 undersampling methods, namely Random Undersampling (RUS), Repeated Edited Nearest Neighbors (RENN), All k-Nearest Neighbors (AllKNN), Neighborhood Cleaning Rule-3 (NM), Condensed Nearest Neighbor (CNN), Neighborhood Cleaning Rule (NCR), and Instance Hardness Threshold (IHT), for handling class imbalance in the datasets. The 28 prediction models were evaluated using 5-fold cross-validation for areas under the receiver operating characteristic curve (AUC), accuracy, recall, and specificity. The best combinations of undersampling methods and machine learning models CNN-RF, NM-SVM, CNN-KNN, and NM-MLP had AUCs of 0.772 (95% CI: 0.566-0.942), 0.797 (95% CI: 0.600-0.950), 0.822 (95% CI: 0.635-0.964), and 0.822 (95% CI: 0.632-0.960), respectively. The AUCs of the machine learning models significantly increased after coupling with undersampling methods, with the MLP model showing the most pronounced improvement. The recall rates of the 4 combined models also improved significantly (by 0.389 for CNN-RF, 0.836 for NM-SVM, 0.532 for CNN-KNN, and 0.372 for NM-MLP). The use of undersampling methods can effectively correct class imbalance in small sample datasets to improve the accuracy of machine learning models for predicting the efficacy of HIFU ablation for uterine fibroids.
- Research Article
24
- 10.2196/47833
- Nov 20, 2023
- JMIR Medical Informatics
Machine learning (ML) models provide more choices to patients with diabetes mellitus (DM) to more properly manage blood glucose (BG) levels. However, because of numerous types of ML algorithms, choosing an appropriate model is vitally important. In a systematic review and network meta-analysis, this study aimed to comprehensively assess the performance of ML models in predicting BG levels. In addition, we assessed ML models used to detect and predict adverse BG (hypoglycemia) events by calculating pooled estimates of sensitivity and specificity. PubMed, Embase, Web of Science, and Institute of Electrical and Electronics Engineers Explore databases were systematically searched for studies on predicting BG levels and predicting or detecting adverse BG events using ML models, from inception to November 2022. Studies that assessed the performance of different ML models in predicting or detecting BG levels or adverse BG events of patients with DM were included. Studies with no derivation or performance metrics of ML models were excluded. The Quality Assessment of Diagnostic Accuracy Studies tool was applied to assess the quality of included studies. Primary outcomes were the relative ranking of ML models for predicting BG levels in different prediction horizons (PHs) and pooled estimates of the sensitivity and specificity of ML models in detecting or predicting adverse BG events. In total, 46 eligible studies were included for meta-analysis. Regarding ML models for predicting BG levels, the means of the absolute root mean square error (RMSE) in a PH of 15, 30, 45, and 60 minutes were 18.88 (SD 19.71), 21.40 (SD 12.56), 21.27 (SD 5.17), and 30.01 (SD 7.23) mg/dL, respectively. The neural network model (NNM) showed the highest relative performance in different PHs. Furthermore, the pooled estimates of the positive likelihood ratio and the negative likelihood ratio of ML models were 8.3 (95% CI 5.7-12.0) and 0.31 (95% CI 0.22-0.44), respectively, for predicting hypoglycemia and 2.4 (95% CI 1.6-3.7) and 0.37 (95% CI 0.29-0.46), respectively, for detecting hypoglycemia. Statistically significant high heterogeneity was detected in all subgroups, with different sources of heterogeneity. For predicting precise BG levels, the RMSE increases with a rise in the PH, and the NNM shows the highest relative performance among all the ML models. Meanwhile, current ML models have sufficient ability to predict adverse BG events, while their ability to detect adverse BG events needs to be enhanced. PROSPERO CRD42022375250; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=375250.
- Research Article
6
- 10.13031/jnrae.15647
- Jan 1, 2023
- Journal of Natural Resources and Agricultural Ecosystems
Highlights Machine Learning (ML) models are identified, reviewed, and analyzed for HAB predictions. Data preprocessing is vital for efficient ML model development. ML models for toxin production and monitoring are limited. Abstract. Harmful algal blooms (HABs) are detrimental to livestock, humans, pets, the environment, and the global economy, which calls for a robust approach to their management. While process-based models can inform practitioners about HAB enabling conditions, they have inherent limitations in accurately predicting harmful algal blooms. To address these limitations, Machine Learning (ML) models can potentially leverage large volumes of IoT data to aid in near real-time predictions. ML models have evolved as efficient tools for understanding patterns and relationships between water quality parameters and HAB expansion. This review describes ML models currently used for predicting and forecasting HABs in freshwater ecosystems and presents model structures and their application for predicting algal parameters and related toxins. The review revealed that regression trees, random forest, Artificial Neural Network (ANN), Support Vector Regression (SVR), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) are the most frequently used models for HABs monitoring. This review shows ML models' prowess in identifying significant variables influencing algal growth, HAB drivers, and multistep HAB prediction. Hybrid models also improve the prediction of algal-related parameters through improved optimization techniques and variable selection algorithms. While ML models often focus on algal biomass prediction, few studies apply ML models for toxin monitoring and prediction. This limitation can be associated with a lack of high-frequency toxin datasets for model development, and exploring this domain is encouraged. This review serves as a guide for policymakers and researchers to implement ML models for HAB prediction and reveals the potential of ML models for decision support and early prediction for HAB management. Keywords: Cyanobacteria, Freshwater, Harmful algal blooms, Machine learning, Water quality.
- Preprint Article
- 10.5194/egusphere-egu22-8321
- Mar 28, 2022
<p>The consequences of ever-increasing human interference with freshwater systems, e.g., through land-use and climate changes, are already felt in many regions of the world, e.g., by shifts in freshwater availability and partitioning between green (evapotranspiration) and blue (runoff) water fluxes around the world. In this study, we have developed a machine learning (ML) model for the possible prediction of green-blue water flux partitioning (WFP) under different climate, land-use, and other landscape and hydrological catchment conditions around the world. ML models have shown relatively high predictive performance compared to more traditional modelling methods for several tasks in geosciences. However, ML is also rightly criticized for providing theory-free “black-box” models that may fail in predictions under forthcoming non-stationary conditions. We here address the ML model interpretability gap using Shapley values, an explainable artificial intelligence technique. We also assess ML model predictability using a dissimilarity index (DI). For ML model training and testing, we use different parts of a total database compiled for 3482 hydrological catchments with available data for daily runoff over at least 25 years. The target variable of the ML model is the blue-water partitioning ratio between average runoff and average precipitation (and the complementary, water-balance determined green water partitioning ratio) for each catchment. The predictor variables are hydro-climatic, land-cover/use, and other catchment indices derived from precipitation and temperature time series, land cover maps, and topography data. As a basis for the ML modelling, we also investigate and quantify (through data averaging over moving sub-periods of different time lengths) a minimum temporal aggregation scale for water flux averaging (referred to as the flux equilibration time, T<sub>eq</sub>) required to reach a stable temporal average runoff (and evapotranspiration) fraction of precipitation in each catchment; for 99% of catchments, T<sub>eq</sub> is found to be ≤2 years, with longer T<sub>eq </sub>emerging for catchments estimated to have higher ratio R<sub>gw</sub>/R<sub>avg</sub>, i.e., higher groundwater flow contribution (R<sub>gw</sub>) to total average runoff (R<sub>avg</sub>). The cubist model used for the ML modelling yields a Kling-Gupta efficiency of 0.86, while the Shapley values analysis indicates mean annual precipitation and temperature as the most important variables in determining the WFP, followed by average slope in each catchment. A DI threshold is further used to label new data points as inside or outside the ML model area of applicability (AoA). Comparison between test data points outside and inside the AoA reveals which catchment characteristics are mostly responsible for ML model loss of predictability. Predictability is lower for catchments with: larger T<sub>eq</sub> and R<sub>gw</sub>/R<sub>avg</sub>; higher phase lag between peak precipitation and peak temperature over the year; lower forest and agricultural land fractions; and aridity index much higher or much lower than 1 (implying major water or energy limitation, respectively). Identifying such predictability limits is crucial for understanding, and facilitating user awareness of the applicability and forecasting ability of such data-driven ML modelling under different prevailing and changing future hydro-climatic, land-use, and groundwater conditions.</p>
- Research Article
24
- 10.1007/s10999-023-09675-4
- Aug 30, 2023
- International Journal of Mechanics and Materials in Design
This study focuses on using various machine learning (ML) models to evaluate the shear behaviors of ultra-high-performance concrete (UHPC) beams reinforced with glass fiber-reinforced polymer (GFRP) bars. The main objective of the study is to predict the shear strength of UHPC beams reinforced with GFRP bars using ML models. We use four different ML models: support vector machine (SVM), artificial neural network (ANN), random forest (R.F.), and extreme gradient boosting (XGBoost). The experimental database used in the study is acquired from various literature sources and comprises 54 test observations with 11 input features. These input features are likely parameters related to the composition, geometry, and properties of the UHPC beams and GFRP bars. To ensure the ML models' generalizability and scalability, random search methods are utilized to tune the hyperparameters of the algorithms. This tuning process helps improve the performance of the models when predicting the shear strength. The study uses the ACI318M-14 and Eurocode 2 standard building codes to predict the shear capacity behavior of GFRP bars-reinforced UHPC I-shaped beams. The ML models' predictions are compared to the results obtained from these building code standards. According to the findings, the XGBoost model demonstrates the highest predictive test performance among the investigated ML models. The study employs the SHAP (SHapley Additive exPlanations) analysis to assess the significance of each input parameter in the ML models' predictive capabilities. A Taylor diagram is used to statistically compare the accuracy of the ML models. This study concludes that ML models, particularly XGBoost, can effectively predict the shear capacity behavior of GFRP bars-reinforced UHPC I-shaped beams.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.