The total organic carbon (TOC) content is a critical parameter for estimating shale oil resources. However, common TOC prediction methods rely on empirical formulas, and their applicability varies widely from region to region. In this study, a novel data-driven Bayesian optimization extreme gradient boosting (XGBoost) model was proposed to predict the TOC content using wireline log data. The lacustrine shale in the Damintun Sag, Bohai Bay Basin, China, was used as a case study. Firstly, correlation analysis was used to analyze the relationship between the well logs and the core-measured TOC data. Based on the degree of correlation, six logging curves reflecting TOC content were selected to construct training dataset for machine learning. Then, the performance of the XGBoost model was tested using K -fold cross-validation, and the hyperparameters of the model were determined using a Bayesian optimization method to improve the search efficiency and reduce the uncertainty caused by the rule of thumb. Next, through the analysis of prediction errors, the coefficient of determination ( R 2 ) of the TOC content predicted by the XGBoost model and the core-measured TOC content reached 0.9135. The root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) were 0.63, 0.77, and 12.55%, respectively. In addition, five commonly used methods, namely, Δ log R method, random forest, support vector machine, K -nearest neighbors, and multiple linear regression, were used to predict the TOC content to confirm that the XGBoost model has higher prediction accuracy and better robustness. Finally, the proposed approach was applied to predict the TOC curves of 20 exploration wells in the Damintun Sag. We obtained quantitative contour maps of the TOC content of this block for the first time. The results of this study facilitate the rapid detection of the sweet spots of the lacustrine shale oil.
Read full abstract