Application of multi-algorithm ensemble methods in high-dimensional and small-sample data of geotechnical engineering: A case study of swelling pressure of expansive soils

Chao Li,Lei Wang,Jie Li,Yang Chen

doi:10.1016/j.jrmge.2023.10.015

Abstract

Geotechnical engineering data are usually small-sample and high-dimensional, which brings a lot of challenges in predictive modeling. This paper uses a typical high-dimensional and small-sample swell pressure (Ps) dataset to explore the possibility of using multi-algorithm hybrid ensemble and dimensionality reduction methods to mitigate the uncertainty of soil parameter prediction. Based on six machine learning (ML) algorithms, the base learner pool is constructed, and four ensemble methods, Stacking (SG), Blending (BG), Voting regression (VR), and Feature weight linear stacking (FWL), are used for the multi-algorithm ensemble. Furthermore, the importance of permutation is used for feature dimensionality reduction to mitigate the impact of weakly correlated variables on predictive modeling. The results show that the proposed methods are superior to traditional prediction models and base ML models, where FWL is more suitable for modeling with small-sample datasets, and dimensionality reduction can simplify the data structure and reduce the adverse impact of the small-sample effect, which points the way to feature selection for predictive modeling. Based on the ensemble methods, the feature importance of the five primary factors affecting Ps is the maximum dry density (31.145%), clay fraction (15.876%), swell percent (15.289%), plasticity index (14%), and optimum moisture content (13.69%), the influence of input parameters on Ps is also investigated, in line with the findings of the existing literature.

Full Text