The Sodium Adsorption Ratio (SAR) is a widely used variable in water quality research, particularly in agriculture and environmental studies. In many cases, the key variables required for SAR calculation, namely Na+, Mg+2, and Ca+2, are not available. Consequently, the potential to calculate SAR using a limited number of water quality variables becomes critically important. The study implemented the Multilayer Perceptron Neural Network (MLPNN), Support Vector Regression (SVR), and K-Nearest Neighbors (KNN) models at level-0 for prediction purposes, along with the Boruta model for variable selection. A stacked ensemble learning model at level-1 enhanced the prediction accuracy. The discharge and water quality dataset from the Zarrin-Gol River in northern Iran was utilized to implement the modeling procedure. Results obtained from the variable selection process using the Boruta model revealed that using a limited number of water quality variables can effectively predict SAR even without the principal variables. Further investigation of the input combinations for the level-0 models demonstrated that, for the MLPNN, KNN, and SVR models, 4, 3, and 1 input variables, respectively, yielded optimal predictions. Among the level-0 models, the MLPNN model exhibited the highest accuracy, with RMSE = 0.54, MBE = 0.26, MAE = 0.44, R = 0.84, IA = 0.67, and KGE = 0.79. Implementing the stacked ensemble learning model at level-1 significantly improved the SAR prediction compared to the level-0 models. The ensemble-NN model yielded the best performance in estimating SAR within the range of recorded data, with RMSE = 0.53, MBE = 0.29, MAE = 0.41, R = 0.87, IA = 0.70, and KGE = 0.82. Residual analysis further confirmed the superior predictive capability of the level-1 models compared to the level-0 models. The generalized-logistic probability distribution function is used to estimate the extreme values data. The Ensemble-KNN model best predicted extreme values data, with RMSE = 0.69, MBE = −0.61, MAE = 0.61, R = 0.61, IA = 0.26, and KGE = 0.37. The findings underscore the substantial advancements achieved through stacked ensemble methods in enhancing the modeling of SAR across various aspects, including total data, extreme values, and models' residuals.
Read full abstract