Performance Evaluation of Homogeneous and Heterogeneous Ensemble Models for Groundwater Salinity Predictions: a Regional-Scale Comparison Study

Alvin Lal,Bithin Datta

doi:10.1007/s11270-020-04693-w

Abstract

Accurate prediction of salinity concentration in the aquifer in response to fluctuating groundwater pumping pattern is an essential component of any coastal groundwater planning and management framework. Data-driven prediction models have been proved efficient in predicting groundwater salinity levels in coastal aquifers. The use of ensemble prediction models is known to be more accurate with robust prediction capabilities when compared with standalone prediction models. This study compares the performances of homogeneous and heterogeneous ensemble models for groundwater salinity predictions. A homogeneous ensemble model is composed of several standalone models of the same type (i.e. employs one machine learning tool) whereas a heterogeneous ensemble model is composed of several standalone models of different types (i.e. employs multiple machine learning tools). Specifically, homogeneous and heterogeneous ensemble models of various standalone machine learning tools such as artificial neural network (ANN), genetic programming (GP), support vector regression (SVR), and Gaussian process regression (GPR) are developed to predict groundwater salinity concentrations in a small Pacific island coastal aquifer system. Standalone and ensemble prediction models are trained and validated using identical pumping and resulting salinity concentration datasets obtained by solving numerical 3D transient density-dependent coastal aquifer flow and transport model. After validation, the ensemble models are used to predict salinity concentration at selected monitoring wells in the modelled aquifer under variable groundwater pumping conditions. Prediction capabilities of the developed ensemble models are quantified using standard statistical procedures. The performance evaluation result suggested that the predictive capabilities of the developed standalone prediction models (ANN, GP, SVR, and GPR) were comparable with the numerical groundwater variable density-dependent flow and salt transport model. However, GPR standalone models had better prediction capabilities when compared with the other standalone models. Also, SVR and GPR standalone models were more efficient (i.e. took less computational training time) than other standalone models. In terms of ensemble models, the performance of the homogeneous GPR ensemble model was established to be superior to other homogeneous and heterogeneous ensemble models. The homogeneous GPR ensemble model was favoured both in terms of efficiency. Overall, based on the limited performance evaluation result, GPR homogeneous model was considered to be the best prediction model when compared with all the standalone models, other homogeneous ensemble model, and the heterogeneous ensemble model. Therefore, it can be utilised as a reliable groundwater salinity prediction tool and also used as an approximate simulator in coupled simulation-optimization models needed for prescribing optimal groundwater management strategies.

Full Text