Abstract

In a novel application of data-driven analytics to the assessment of unconventional oil and gas resources, we present a Transfer Learning framework based on a combination of unsupervised and supervised machine learning techniques on a relatively large dataset from one of the main unconventional plays in the United States. To improve the generalizability of our work, instead of focusing on building one model with the highest predictive accuracy, we focus on quantifying the changes in predicted productivity of multiple sub-basin models that are trained on data from different sub-basins in the same shale play. We present the main steps of developing a data-driven solution in the unconventional oil and gas domain and discuss the importance of taking into account the domain knowledge of petroleum engineering and geology during the process. In an unsupervised clustering problem, we show the impact of clustering variable choice guided by domain expertise. We investigate the variation of model's accuracy when trained and tested on different parts of the unconventional play. The proposed Transfer Learning framework allows quantification of the change in predictive performance of the model of one part of the unconventional play as the amount of training data taken from another part is varied. We show how a careful selection of data types can balance between three objectives: reduce overfitting of the model to training data, augment model transferability across a very large shale play, and enhance accuracy in a part or sub-basin of the play. We show that selecting the clusters based on geologic variables such as average depth of the formation and thermal maturity of the source rock instead of geographical coordinates of the wells significantly increases the predictive performance of the models. In addition, we demonstrate the increase in model accuracy after adding production data to our predictors. We also discuss the limitations associated with accessing production data in a real-world Transfer Learning workflow. The variable importance ranking interpretation of monthly production rates also supports the significance of production data. We used statistical analysis of large samples of decline curves to find optimum duration of production history for training. To address the hyperparameter fine-tuning for our machine learning algorithms, we identified optimal ranges of Random Forest hyperparameters to achieve lowest test-to-train error ratio. Finally, we demonstrate a successful application of the proposed workflow on a real dataset from the Eagle Ford basin in South Texas in a few examples of improved Transfer Learning under the domain experts' guidance. Our findings show that by using models that have been trained on a large group of wells in a mature field in a transfer learning framework we can achieve better predictive performance in a greenfield with very few wells.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call