Deep Reinforcement Learning for Automatic Drilling Optimization Using an Integrated Reward Function

Xu Huang,Ted Furlong,John Bomidi,Trieu Phat Luu

doi:10.2118/217733-ms

Xu Huang, Ted Furlong + Show 2 more

https://doi.org/10.2118/217733-ms

Copy DOI

Export

Save

Cite

Publication Date: Feb 27, 2024

Abstract
Full-Text
Similar Papers

Abstract

Listen

Abstract Drilling optimization is a complicated multi-objective processing optimization problem. During drilling, drillers need to adjust WOB and RPM continuously in a timely manner, not only to maximize ROP, but also to prevent severe vibration and maintain downhole tool durability. In this study, a virtual drilling agent using a deep reinforcement learning (RL) model is developed and trained to automatically make drilling decisions and proven to effectively optimize drilling parameters. A deep RL model using a deep deterministic policy gradient (DDPG) algorithm is developed to optimize drilling process. In RL model, the reward of drilling decisions at each drilling step is a function of drilling ROP, downhole vibration tendency, bit dull state, and risks of tool failure. Separate modules to evaluate the reward of each component are implemented and trained using field and laboratory data. Deep RL model is applied and tested comprehensively on different drilling environments including hard and abrasive rock, embedded rock, vibrational vs. stable drilling. The hyper-parameters of the actor-critic NN architecture in RL model are carefully selected to improve the model convergence. Results show the deep RL model can effectively find the optimum drilling solutions in various drilling environments. In soft formation, RL model applies the upper limit of WOB and RPM throughout the drilling depth to maximum ROP and reduce drilling time. In hard and abrasive formation, RL model gradually changes RPM and WOB to prevent the pre-mature wear of PDC cutters. The change of the drilling parameters is optimized based on rock abrasivity and target drilling depth. In unstable drilling environment, while RL model limits the ratio of WOB and RPM to avoid stick-slip vibration, simultaneously, the WOB and RPM is controlled to maximum ROP to drill to TD. In embedded formation, RL model successfully found the optimum solution by adjusting WOB/RPM to avoid stick-slip and overloading of bit cutting structure. The learning process of RL model shows hyper-parameter selection plays a critical role in model convergence and accuracy. Improperly selected hyper-parameter in RL model can lead to the failure of solution searching or sub-optimum solution. Overall, the RL model is approved to effectively find optimum drilling solutions in the various drilling environments and can be applied for both pre-well drilling planning and real-time drilling optimization. To the best of authors' knowledge, this is the first attempt to develop deep RL model for drilling optimization by implementing a combination of ROP, vibration, bit dull, and durability in the reward function. The proposed RL model can be extended to include more reward factors in the drilling optimization such as whirl and high frequency torsional oscillation (HFTO), stuck pipe, tool temperature and so on. The RL model can be applied for both pre-well drilling planning and real-time drilling optimization.

Full Text