Abstract

Recurrent neural networks (RNN), which are able to capture temporal natures of a signal, are becoming more common in machine learning applied to petroleum engineering, particularly drilling. With this technology come requirements and caveats related to the input data that play a significant role on resultant models. This paper explores how data pre-processing and attribute selection techniques affect the RNN models’ performance. Re-sampling and down-sampling methods are compared; imputation strategies, a problem generally omitted in published research, are explored and a method to select either last observation carried forward or linear interpolation is introduced and explored in terms of model accuracy. Case studies are performed on real-time drilling logs from the open Volve dataset published by Equinor. For a realistic evaluation, a semi-automated process is proposed for data preparation and model training and evaluation which employs a continuous learning approach for machine learning model updating, where the training dataset is being built continuously while the well is being made. This allows for accurate benchmarking of data pre-processing methods. Included is a previously developed and updated branched custom neural network architecture that includes both recurrent elements as well as row-wise regression elements. Source code for the implementation is published on GitHub.

Highlights

  • Data preparation is often left as an afterthought when discussing machine learning (ML) research applied to drilling

  • If drilling torque suddenly rises and the data train-test split happens to be just before data row with the rise, linear interpolation would cause the value to rise due to effect in the future, making the model perform better than it realistically should in real-life operation

  • In this methodology it is assumed that the machine learning prediction model will be trained while drilling, the performance of such model has to be explored with consideration of continuously expanding training dataset available

Read more

Summary

Background and state of art

Data preparation is often left as an afterthought when discussing machine learning (ML) research applied to drilling. Conducted review of rate of penetration (ROP) prediction papers (Barbosa et al, 2019) acknowledges the issue of data gaps in drilling logs; quoting directly from the aforementioned review paper: In general, this problem was omitted This is likely due to researchers working on datasets that are already pre-processed, where they are not exposed to this common practical problem. To capture the temporal behavior of a given logged attribute recurrent neural networks (RNN) are commonly used (Rumelhart et al, 1986) Basic architecture of such network is shown in Fig. 1; cell A takes current input, xt, as well as the information from recurrent connection vt−1 from the previous step; this may be the cell output ht−1, but can contain additional state information. This is especially important when a given model is applied in a continuous learning approach, when the training dataset continuously expands and correlation scores can dynamically change as the drilling progresses

Motivation and contribution
Paper structure
Process framework
Results analysis
Data selection
Run gap statistics
Pre-processing
Applying ML model
Model evaluation and inspection
Hyperparameter tuning
Data pre-processing
Resampling importance and algorithms
Resampling quality evaluation
Resampling
Attribute selection and PCA configuration
On neural network architectures
Transformers
Conclusion
Future work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call