Natural Language Processing Increases Accuracy of Kick, Lost-Circulation Detection

Chris Carpenter

doi:10.2118/0723-0078-jpt

Abstract

_ This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper SPE 206340, “Natural Language Processing Applied to Reduction of False and Missed Alarms in Kick and Lost-Circulation Detection,” by Michael Yi, Pradeepkumar Ashok, SPE, and Dawson Ramos, Intellicess, et al. The paper has not been peer reviewed. _ Kick and lost-circulation events are major contributors to nonproductive time. In the absence of good flow-in and flow-out sensors, pit-volume trends offer the best possibility for influx and loss detection, but errors occur because external mud addition and removal to the pits is not monitored or sensed. In the complete paper, the authors introduce a method that uses a Bayesian network to aggregate trends detected in time-series data with events identified by natural language processing of driller memos, thereby improving significantly the accuracy and robustness of kick and lost-circulation detection. Methodology The objective of this project was to develop a model to detect and quantify kick and lost-circulation events using readily available real-time surface signals. This model used drilling memos to supplement real-time surface signals. Fig. 1 shows the general work flow to create a lost-circulation- and kick-detection model using drilling-memo comments and Bayesian networks. Consumption of Drilling Memos. While real-time channels provide insight into potential kick and lost-circulation events, drilling memos provide additional details regarding events that could affect these real-time channels. By consuming the drilling memos and providing the rig a relationship and direct communication with the kick and lost-circulation models, the algorithm can use both surface channels and textual information from the memos to identify such events. Note that these drilling memos are also consumed in real-time. The drilling-memo data used included two channels, Date/Time and Drilling Memo. In total, 494,726 drilling memos from 271 wells were used. The data set first needed to be labeled into memo types. Given the size and many varieties of the drilling memos, a semisupervised method was used to develop a classifier to categorize the drilling memos. First, the memos were preprocessed to improve the model’s ability to cluster similar memos. This involved the following changes: - Conversion of similar words to assist with clustering - Removal of special characters - Removal of numbers - Removal of punctuation - Stemming (the process of reducing inflected or derived words to the base form) - Removal of stop words - Removal of extra white space and tabs - Lower-casing of all words The data was then clustered using a K-means clustering algorithm with 100 clusters. Many clusters were selected in order to create a large number of differentiating groups because of the variety of drilling memos provided in the data. Once clusters were generated, each cluster was grouped into classified memo types, which, in turn, provided labels for the drilling memos that fell into each cluster. For the memo data set, the 100 clusters were categorized into 34 different memo types. Drilling memos that contained text indicating they should be categorized into a certain memo type were processed separately through custom scripts.

Full Text