An Optimization Framework with Dimensionality Reduction Using Markov Chain Monte Carlo and Genetic Algorithms for Groundwater Potential Assessment

Zitao Wang,Chao Yue,Jianping Wang

doi:10.1016/j.asoc.2024.111991

Abstract

Limited samples and high-dimensional feature spaces often hinder the accuracy of machine learning (ML) models in regional groundwater potential assessment (GPA). This study proposes a novel framework, the GPA with Dimensionality Optimization (GPADO), that optimizes feature dimension reduction to enhance prediction performance. Taking the Jianghan Basin as an example, data on nine continuous variables and five categorical variables influencing the region's GPA were gathered, expanding the feature set to 37 through One-hot encoding for categorical variables. Three scenarios were devised to assess prediction outcomes following various dimensionality reduction approaches. Comparative analysis revealed that a hybrid dimension reduction method, incorporating both continuous and categorical variables, yielded the highest validation set accuracy. Consequently, genetic algorithm and Markov Chain Monte Carlo methods were employed to determine the optimal solution and uncertainties associated with four unknown parameters: the chosen dimension reduction method for continuous and categorical variables, and the number of dimensions retained. Results indicated that utilizing singular value decomposition to reduce categorical variables to three dimensions, coupled with principal component analysis reducing continuous variables to three dimensions, produced the highest model validation accuracy of 0.834 within the GPADO framework. This optimal configuration facilitated automated ML training, resulting in a final validation set accuracy of 0.851 and a test set accuracy of 0.836. The resulting model provided a more precise spatial distribution of groundwater potential and demonstrated the GPADO framework's effectiveness in improving GPA accuracy, particularly in data-scarce regions. The GPADO framework offers a valuable approach for enhancing GPA studies.

Full Text