In this study, we designed a machine learning-based parallel global searching method using the Bayesian inversion framework for efficient identification of dense non-aqueous phase liquid (DNAPL) source characteristics and contaminant transport parameters in groundwater. Swarm intelligence organized hybrid-kernel extreme learning machine (SIO-HKELM) was proposed to approximate the forward and inverse input-output correlation with a high accuracy using the DNAPL transport numerical simulation model. An adaptive inverse-HKELM was established for preliminary estimation of the source characteristics and contaminant transport parameters to correct prior information and generate high-quality initial starting points of parallel searching. A local accurate forward-HKELM surrogate of the numerical model was embedded in the searching system for avoiding repetitive CPU-demanding likelihood evaluations. A sensitivity-based Metropolis criterion (MC), incorporating the dynamic particle swarm optimization (SD-PSO) algorithm, was developed for improving the search ergodicity and realizing precise inversion of all the unknown variables with drastic variations in sensitivity to the likelihood function. Results showed that the generalization capability and robustness of SIO-HKELM were superior to those of the traditional machine learning methods, including KELM and support vector regression (SVR), and it sufficiently approximated the forward and inverse input-output mapping of the numerical model with testing determination coefficients of 0.9944 and 0.6440, respectively. With high-quality prior information and initial starting points generated by the adaptive inverse-HKELM feed approach, the uncertainty in the inversion outputs was reduced, and the searching process rapidly converged to reasonable posterior distributions in around 60 iterations. Compared with the widely used multichain Markov chain Monte Carlo (MCMC) approach, the parallel searching lines generated by SD-PSO-MC adequately covered the searching space, and the "equifinality" effect was more effectively restrained by reducing the relative errors of all the point estimations to less than 8%. Therefore, the real source information reflected by the statistical characteristics of the SD-PSO-MC inversion outputs was more precise than that obtained using the multichain MCMC approach.
Read full abstract