Accurately assessing geothermal potential is a significant global challenge, and the development of reservoir temperature prediction models is a key aspect of evaluating this potential. Machine learning modeling serves as an effective tool in this process. However, before modeling, the inability to fully screen complex and nonlinear input features, combined with the insufficiency of datasets, often impacts the predictive accuracy of the models. This study collected hydrochemical test data from 65 groundwater samples in the Guide area of Qinghai Province from 2009 to 2016. To address the issue of missing data, we employed the LRTC-TNN method to supplement the dataset. Subsequently, we conducted correlation analysis on the data features using normalization and Pearson correlation coefficients to identify important features. Based on the processed dataset, we constructed XGBoost and LightGBM models and used 5-fold cross-validation and Bayesian optimization model to select the optimal combination of model parameters. In the modeling analysis, we explored the advantages and disadvantages of both models and evaluated their performance in terms of accuracy, robustness, and generalization capability. The results indicate that the model performs best when 80% of the training data is used. The LRTC-TNN model effectively fills in missing data, achieving an accuracy exceeding 95%. When applying the XGBoost and LightGBM models to the training set, test set, and complete dataset, the XGBoost model consistently yielded significant predictive results, specifically an R² value of 98.09%, a RMSE of 0.546, and a MAE of 0.396. Robustness analysis showed that the XGBoost model is more robust, while feature importance and sensitivity analysis revealed that chloride ions are the key independent variable affecting reservoir temperature predictions. Furthermore, generalization capability validation indicated that the model can adapt well to different datasets and provide accurate predictive results. In conclusion, the XGBoost model, which considers complementary data, demonstrates excellent generality in reservoir temperature prediction, providing a reliable solution for accurately determining underground reservoir temperatures.
Read full abstract