Super learner approach to predict total organic carbon using stacking machine learning models based on well logs

L Goliatt,C.M Saporetti,E Pereira

doi:10.1016/j.fuel.2023.128682

Abstract

Determining the total organic carbon (TOC) content is essential information for risk assessment in oil exploration, as it is a parameter used for the characterization of hydrocarbon-generating rocks, considering that intervals rich in organic matter are the basic requirements for oil and gas accumulation. However, the determination of TOC can be costly, demanding destructive tests in samples from the source rock, expensive laboratory machinery, and specialized personnel. In this context, one notes the necessity of the computational methods to bypass those problems and that machine learning models emerge as an option. One approach to integrating machine learning methods improves performance and, consequently, the prediction quality is stacking models. This paper presents a super learner strategy, based on stacking approaches, as a surrogate model for TOC modeling. The super-learner has three levels in this structure containing different types of learners (machine learning methods), where two stack models from the first two levels. The following machine learning models were used in the building of super learner the K-Neighbors Nearest (KNN), Linear Regression (LR), Multi-layer Perceptron Neural Network (MLP), Random Forest (RF), Ridge Regression (RR), and Support Vector Regression (SVR). The proposed model was compared with standalone machine learning models and other canonical stacking models. The resulting super learner stacking model attained the best average performance for the TOC modeling (R = 0.897, R2 = 0.80, RMSE = 1.16, MAE = 0.93, and MAPE = 28.30%). The proposed approach produces an alternative data-driven efficient model for TOC prediction, resulting in reliable automated technology to assist oil and gas well management and decision-making.

Full Text