Abstract

The total organic carbon (TOC) content is a critical parameter for estimating shale oil resources. However, common TOC prediction methods rely on empirical formulas, and their applicability varies widely from region to region. In this study, a novel data-driven Bayesian optimization extreme gradient boosting (XGBoost) model was proposed to predict the TOC content using wireline log data. The lacustrine shale in the Damintun Sag, Bohai Bay Basin, China, was used as a case study. Firstly, correlation analysis was used to analyze the relationship between the well logs and the core-measured TOC data. Based on the degree of correlation, six logging curves reflecting TOC content were selected to construct training dataset for machine learning. Then, the performance of the XGBoost model was tested using K -fold cross-validation, and the hyperparameters of the model were determined using a Bayesian optimization method to improve the search efficiency and reduce the uncertainty caused by the rule of thumb. Next, through the analysis of prediction errors, the coefficient of determination ( R 2 ) of the TOC content predicted by the XGBoost model and the core-measured TOC content reached 0.9135. The root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) were 0.63, 0.77, and 12.55%, respectively. In addition, five commonly used methods, namely, Δ log R method, random forest, support vector machine, K -nearest neighbors, and multiple linear regression, were used to predict the TOC content to confirm that the XGBoost model has higher prediction accuracy and better robustness. Finally, the proposed approach was applied to predict the TOC curves of 20 exploration wells in the Damintun Sag. We obtained quantitative contour maps of the TOC content of this block for the first time. The results of this study facilitate the rapid detection of the sweet spots of the lacustrine shale oil.

Highlights

  • Unconventional shale oil and gas reservoirs have profoundly revolutionized the energy industry in North America and China [1, 2]

  • The results showed that the XGBoost model has the best prediction performance, with R2 of 0.9135, followed by the RMF model with an R2 value of 0.8931 and the ΔlogR method with an R2 value of 0.8345

  • We proposed a robust data-driven Bayesian optimization XGBoost model to predict the total organic carbon (TOC) content using wireline log data

Read more

Summary

Introduction

Unconventional shale oil and gas reservoirs have profoundly revolutionized the energy industry in North America and China [1, 2]. The thickness of organic-rich lacustrine shale plays is usually very small; it is unreasonable to use discrete coremeasured TOC data points to evaluate the hydrocarbon generation potential. Methods using well logs to predict the TOC content include statistical correlation, overlapping methods, multiple regression, and machine learning. Passey et al proposed the △log R method [16], which overlaps the porosity logs with the deep resistivity (RD) log and uses the nonsource rock zone as the baseline to establish an empirical relationship formula between the TOC content and the well logs. Machine learning is data-driven, which improves the accuracy and efficiency of TOC prediction compared to conventional methods. In this study, a workflow consisting of XGBoost machine learning based on Bayesian optimization for TOC prediction is proposed and applied to lacustrine shale oil in the Bohai Bay Basin. The prediction results are compared with the △ log R method and other typical machine learning methods to demonstrate the accuracy and reliability of the proposed method

Theory of Machine Learning
A Taylor expansion is performed on the objective function to obtain
Evaluation the results
Geology Settings and Data Analysis
Evaluation Method of Model Performance
Results and Discussion
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call