Over-sampling for data augmentation in data-driven models for the shear strength prediction of RC membranes

Luis Alberto Bedriñana,Jostin Gabriel Landeo,Julio Cesar Sucasaca,Christian Málaga-Chuquitaype

doi:10.1016/j.istruc.2024.105870

Abstract

Complex reinforced concrete (RC) structures are generally assessed as a group of individual membrane elements subjected to in-plane combined stresses; however, an accurate prediction of the shear strength of such elements is still a complex task. In addition, the limited availability of experimental data of RC panels, which also presents an unbalanced statistical distribution towards lower strength values, limits the development of data-driven models. Thus, it is crucial to explore data augmentation techniques with a view to supporting the development of more accurate and generalizable predictive models in structural engineering. This paper evaluates over-sampling techniques for data augmentation and their use in the creation of an explainable, data-driven model for the shear strength prediction of RC panels. A dataset of 195 experimental tests of RC panels under different loading conditions is initially collected. Five over-sampling techniques are implemented to extend the original dataset and to reduce the imbalance. Three ensemble models (Random Forest, AdaBoost, and XGBoost) are trained with each of the generated datasets. It is observed that all the over-sampling techniques produced predictive models with better performance than the original dataset; however, the results show that by applying the Random Over-Sampling (ROS) the performance metrics of the model can significantly increase (around 39% for some metrics) compared to the model with the original dataset, without any overfitting issues. This strategy allowed to develop an accurate XGBoost model (with a value of R2 = 0.97 for the testing set). The explainability of the final predictive model (XGBoost model obtained from ROS) is evaluated using the SHAP (SHapley Additive exPlanations) analysis. The proposed predictive model outperformed traditional mechanics-based models (improvement of approximately 27% over SMCS and 33% over MCFT for some performance metrics) and with a more controlled error distribution over the range of variables. The proposed model was also more accurate (mean prediction ratio of 0.98) than sophisticated finite element analysis (mean prediction ratio of 0.84) for six specimens of the original dataset.

Full Text