Abstract

AbstractHydrocarbon production from shale formation has become an essential part of the global energy supply in the past decade. The life of a project in an unconventional play significantly depends on the prediction of Estimated Ultimate Recovery (EUR). However, the conventional methodology to predict EUR becomes less accurate for shale formations, which significantly affects the economics returns of projects in unconventional plays. The objective of this article is to investigate the most important independent variables, including petrophysics and completion parameters, to estimate EUR by the machine learning algorithm. A novel machine learning model based on Random Forest Regression is introduced to predict EUR and to rank the importance of the independent variables.In this article, production/petrophysics/engineering/ data with more than 25 variables from 4000 wells in Eagle Ford is summarized for analysis. The data is collected from production monitoring, well logging, well testing, seismic interpretation and lab experiments. This paper has three major components. Firstly, a multivariate linear regression model is created to predict the overall EUR. Secondly, the spatial autocorrelation analysis is carried out to identify whether spatial variables could affect the accuracy of the multivariate regression model. Thirdly, the Random Forest Regression models are trained to examine their reliability in predicting EUR with spatially autocorrelated data. The importance of key predictors is also identified. The final models are tuned with optimized hyperparameters. Through the article, the predictive capabilities of each Random Forest Regression model are discussed in detail to understand the physics behind unconventional hydrocarbon production mechanisms.The results and workflow presented in this paper are insightful and novel. Firstly, we test the multivariate regression analysis with all the petrophysics and completion variables using the backward elimination method. This widely used model has a limitation of excluding the spatial information. In order to identify the impact of spatial variable, we calculate the Moran's Index and find out that the data in this study is clustered or spatially autocorrelated. The p-value for EUR, Oil EUR and Gas EUR are 0.000002, 0.000000 and 0.12, which all reject the null hypothesis that the data is randomly distributed. To include the spatial information in the prediction, we use advanced machine learning technology, Random Forest, to predict the EUR with a combination of petrophysics, completion variables and spatial information. The key variables to predict EUR, Oil EUR and Gas EUR by the Random Forest Regression are identified. However, the importance of the key variables to predict Oil EUR and Gas EUR are different. Therefore, we split the overall EUR Random Forest Regression model (57% explained) into two prediction models, one for Oil EUR prediction and one for Gas EUR prediction. The Gas EUR Random Forest Regression model has better performance (76% explained) compared to the Oil EUR Random Forest Regression model (60% explained).This study provides a deeper understanding of unconventional hydrocarbon production prediction from a big data perspective, and proposes a novel and reliable machine-learning model to predict EUR to evaluate economic returns in Eagle Ford. Compared to the traditional multivariate regression model, our Random Forest Regression models are more reliable. In addition, the Random Forest technique is able to rank the importance of the relevant independent variables, and the rank of importance can be applied to guide and to improve data collection and model training for further study on this topic. The workflow presented in this article can be also used to train data for other unconventional resource plays.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call