_ This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper OTC 32978, “Development and Implementation of an AI-Based System To Automate Textual Classification on Daily Drilling Reports,” by Stephan Perrout, Aliel F. Riente, and Guilherme S.F. Vanni, SPE, Petrobras, et al. The paper has not been peer reviewed. Copyright 2023 Offshore Technology Conference. Reproduced by permission. _ Structured daily drilling reports (DDRs) are a rich source of information that allows better planning, more-accurate risk analysis, and improved key performance indicators and contracts. However, such information is originally stored in a free-text and unstructured format, which becomes difficult for efficient data mining. With the advance of artificial intelligence (AI) technologies, particularly AI language models, applying such techniques over unstructured data has become critical to digital transformation. The complete paper presents an approach for automatic DDR classification that incorporates new techniques of AI. Introduction This work addresses the complex task of automatic classification of DDRs according to a newly proposed ontology. The ontology follows a hierarchical model that classifies actions into three or four levels depending on the intervention, considering drilling, completion, and abandonment. Each event has an ontology built and reviewed by experts in oil and gas. Classifying DDR constitutes a demanding task, and effectively exploiting AI-based models represents a promising solution. This work bridges the gap by proposing a classifier based on transformers along with recurrent neural networks (RNNs) to classify reported events described in unstructured text related to drilling, completion, and abandonment interventions. A large number of DDRs was used for training and validation of the proposed classifier, yielding promising results for key processes in the company. Neural-Network Techniques Bidirectional Long Short-Term Memory. Early neural-network models are characterized by inputs of fixed length. This is a drawback when working with texts, however, because sentences vary in their number of words. To overcome such an issue and to process data sequentially, RNNs were proposed. The RNNs are characterized by a set of parameters inherent from the early models plus an internal memory (a hidden or internal state) responsible for storing the context of the sequence being processed. Long short-term memory (LSTM) is a variation of RNN proposed to mitigate two problems: Information can be easily lost when processing very long sequences, and the gradient can become quite low because of the high number of mathematical operations performed during the processing while remaining far from reaching the threshold. LSTM consists of a set of parameters called the input gate, forget gate, and output gate that control information flow through the network. This set of additional parameters helps to maintain only what is important for the internal state of the network besides controlling the output. BiLSTM is a variant of LSTM that comprises two LSTMs. One processes texts from left to right, and the second one processes texts from right to left. This feature allows “future” elements to be part of the model’s decision process for “past” elements. The final classification is the combination of the output of both LSTMs.
Read full abstract