Applications of Machine Learning Methods to Predict Hole Cleaning in Horizontal and Highly Deviated Wells
Summary Machine learning (ML) has become a robust method for modeling field operations based on measurements. For example, wellbore cleanout is a critical operation that needs to be optimized to enhance the removal of solids to reduce problems associated with poor hole cleaning. However, as wellbore geometry becomes more complicated, predicting the cleaning performance of fluids becomes more challenging. As a result, optimization is often difficult. Therefore, this research focuses on developing a data-driven model for predicting hole cleaning in deviated wells to optimize drilling performance. More than 500 flow loop measurements from eight studies are used to formulate a suitable ML model to forecast hole cleanout in directional wells. Measurements were obtained from hole-cleaning experiments that were conducted using different loop configurations. Experiments ranged in test-section length from 22 to 100 ft, in hole diameter from 4 to 8 in., and in pipe diameter from 2 to 4.5 in. The experiments provided measured equilibrium bed height at a specific flow rate for various fluids, including water-based and synthetic-based fluids and fluids containing fibers. Several relevant test parameters, including fluid and cutting properties, well inclination, and drillstring rotation speed (drillpipe rev/min), were also considered in the analysis. The collected data have been analyzed using the Cross-Industry Standard Process for Data Mining. This paper is unique because it systematically evaluates various ML models for their ability to describe hole cleanout processes. Six different ML techniques: boosted decision tree (BDT), random forest (RF), linear regression, multivariate adaptive regression spline (MARS), neural networks, and support vector machine (SVM) have been evaluated to select the most appropriate method for predicting bed thickness in a wellbore. Also, we compared the predictions of the selected ML method with those of a mechanistic model for cases without drillstring rotation. Finally, using the ML model, a parametric study has been conducted to examine the impact of various parameters on the cleanout performance of selected fluids. The results show the relative influence of different variables on the prediction of cuttings bed. Accordingly, flow rate, drillpipe rev/min, and fluid behavior index have a strong impact on dimensionless bed thickness, while other parameters such as fluid consistency index, solids density and diameter, fiber concentration, and well inclination angle have a moderate effect. The BDT algorithm has provided the most accurate prediction with an R2 of 92%, a root-mean-square error (RMSE) of 0.06, and a mean absolute error (MAE) of roughly 0.05. A comparison between a mechanistic model and the selected ML technique shows that the ML model provided better predictions.
- Conference Article
- 10.2118/212912-ms
- Mar 14, 2023
Machine learning (ML) has become a robust method for modeling field operations based on measurements. For example, wellbore cleanout is a critical operation that needs to be optimized to enhance the removal of solids to reduce problems associated with poor hole cleaning. However, as wellbore geometry becomes more complicated, it gets more difficult to predict the cleaning performance of fluids. As a result, optimization is often challenging. Therefore, this study aims to develop a data-driven model for predicting hole cleaning in deviated wells to optimize drilling performance.More than 500 flow loop measurements from 8 studies are used to formulate a suitable ML model to forecast hole cleanout in directional wells. Measurements were obtained from hole-cleaning experiments that were conducted using different loop configurations. Test sections ranged in length from 22 to 100 feet, in hole diameter from 4 to 8 inches, and in pipe diameter from 2 to 4.5 inches. The experiments provided measured equilibrium bed height at a specific flow rate for various fluids, including water-based and oil-based fluids and fluids containing fibers. Several relevant test parameters, including fluid and cutting properties, well inclination, and drilling string rotation speed, were also considered in the analysis. The collected data has been analyzed using the Cross-Industry Standard Process for Data Mining (CRISP-DM). Six different machine learning techniques (Random Forest, Linear Regression, Neural Networks, Multivariate Adaptive Regression Spline, Support Vector Machine, and Boosted Decision Tree) have been evaluated to select the most appropriate method for predicting bed thickness in a wellbore. Also, we compared the predictions of the selected ML method with those of a mechanistic model for cases without drill string rotation. Finally, using the ML model, a parametric study has been conducted to investigate the impact of various parameters on the cleanout performance of selected fluids.Results show the relative influence of different variables on the prediction of cuttings bed. Accordingly, flow rate, drill string rotation, and fluid behavior index have a strong impact on dimensionless bed thickness, while other parameters such as fluid consistency index, solids density and diameter, fiber concentration, and well inclination angle have a moderate effect. The Boosted Decision Tree algorithm has provided the most accurate prediction with an R-square of approximately 90%, Root Mean Square Error (RMSE) of close to 0.07, and Mean Absolute Error (MAE) of roughly 0.05. A comparison between a mechanistic model and the selected ML technique shows that the ML model provided better predictions.
- Research Article
11
- 10.4018/ijssci.2021100102
- Oct 1, 2021
- International Journal of Software Science and Computational Intelligence
This review aims to systematically analyze ML models from four aspects: type of ML technique, estimation accuracy, model comparison, and estimation context. A systematic literature review of empirical studies was conducted on the ML models published in the last decades. Fifty-one primary studies relevant to the objective of this research were revealed. After investigating these studies, five ML techniques have been employed in brain tumor classification and prediction. Ultimately, the estimation accuracy of these ML models could be regarded and accepted and outperformed non-ML models. ML models have been revealed to be useful in brain tumor classification and prediction. Genetic algorithm among the ML models achieved an accuracy of 100%. Nevertheless, ML models are still restricted in the industry, so initiative and encouragement are needed to make ML models easier. Further work is required on these ML models to verify the accuracy and consider other performance metrics other than the accuracy.
- Research Article
10
- 10.1016/j.atech.2024.100425
- Mar 1, 2024
- Smart Agricultural Technology
Smart Irrigation System is a complex concept used to control, monitor and automate the irrigation of yields by integrating artificial intelligence techniques such as Machine Learning strategies. SIS has endorsed various machine learning models. However, there has been no attempt to analyze the empirical evidence on ML models in a systematic way. Moreover, ML based SIS often face problems and raise questions that must be resolved. This article presents a systematic literature review of ML based SIS; an overview of the literature on ML is designed, revealing a premier and unbiased survey of the existing empirical research. 55 selected studies published from 2017 to 2023 and nine broadly used ML models were identified. Furthermore, four analysis aspects namely type of ML technique, estimation accuracy, model comparison, and estimation context have been outlined. The findings of this review prove the performance capability of the ML techniques endorsed within SIS. The ML techniques outperform other conventional approaches. However, the application of ML models in SIS is still limited and more effort is needed to obtain well-formed and generalizable results. To this end, and based on the outcomes obtained in this work, future guidelines have been provided to practitioners and researchers to grasp the major contributions and challenges in the state-of-the-art research.
- Research Article
210
- 10.1016/j.ijmedinf.2021.104484
- May 8, 2021
- International journal of medical informatics
Comparison of machine learning and logistic regression models in predicting acute kidney injury: A systematic review and meta-analysis
- Research Article
- 10.2174/0123520965315046240802080024
- Aug 16, 2024
- Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering)
Background: The air quality of any area depends upon the various PMs (particulate matter) and hazardous gases present in the air. Low-cost PM sensors and gas sensors are present in different target places to monitor the air quality, read the environmental data, and transmit it to local servers through the IoT device. The low-cost sensor is not reliable due to its low sensing capacity; therefore, the read data is calibrated with the meteorological data presented by the nearby meteorological Centre of that particular area. The calibrated reading data sent to the server could be analyzed through some Machine Learning [ML] models. The ML models help to predict the risk of asthma in a particular area. The risk of asthma is directly related to the air quality of the surroundings. It is observed that the air quality of the industrial area is much worse than the non-industrial belt. Air quality monitoring of industrial areas is always a challenging task due to the ununiformed pollution in some segregated places around the industry, emitting pollutants mostly from chimneys. The air quality of any area depends upon the PM (PM), i.e., PM2.5 and PM10.0, as well as the gasses like NO2(Nitrogen Dioxide), NH3 (Ammonia), SO2(Sulfur dioxide), CO(Carbon monoxide), O3(Ozone) and Benzene. These are the most hazardous gases generally emitted by common heavy industries like iron and steel. In this article, the researchers considered the industrial belt of the Asansol- Durgapur region of West Bengal, India, and predicted the risk of asthma attacks for the test dataset. The experiment was carried out on 10 different supervised machine learning [SML] models as well as semi-supervised machine learning (SSML) models. The SML models have been further refined through hyper-parameter tuning, and better results have been obtained in the case of some ML models. The result has been compared with the existing literature considering the same external environment from where the meteorological data was collected, and similar ML models have been used. The research outperformed the existing literature, which is depicted in the result and analysis section of the article. Methods: The study evaluated ML models, both supervised and semi-supervised, to assess pollution levels. Relevant features were selected while less relevant ones were discarded. Accuracy levels of different ML algorithms werecompared in the results. The most effective model for an IoT system was chosen to maximize accuracy. In semi-supervised learning, feature selection followed supervised learning, but testing was akin to unsupervised learning. Results were compared with supervised learning data, enhancing reliability. Results: The result employing various classifiers werepresented across tables after the independent parameter Ozone was removed. Following the output of several classifiers, the results were verified using the k-fold validation method, with k being set to 5 or 10, accordingly. Tables display the best outcome, which is indicated in bold characters. method: In this research work the researcher considered 9 different ML models and used them as supervised as well as semi supervised model to determine the pollution level of the certain area. In this research work the researcher also selected the most relevant features and discarded the less relevant features. In case of SML algorithm, the accuracy level of the different ML algorithm has been determined and depicted in the result analysis section. The most effective ML model has been chosen for the proposed embedded system so that accuracy could be achieved at most. In case of semi supervised algorithm the feature selection is done as per the supervised algorithm. In this case the training is done same as the SML algorithm, but the testing phase is done like unsupervised machine learning algorithm where the decision parameter is predicted and ultimately matched with the previously achieved data of SML algorithm. The reliability of this approach is much more effective than simple SML algorithm. Conclusion: This study focused on predicting asthma risk in the Asansol-Durgapur industrial belt, India, using low-cost PM and gas sensors. Data calibration with meteorological inputs enhanced accuracy. ML models predicted risk and were refined through hyper-parameter tuning. Comparative analysis showed superior performance, emphasizing the importance of precise air quality monitoring. While offering a robust framework for future research, the study’s limitation lies in its area-specific dataset.
- Research Article
- 10.69983/sujeiti/1113
- Jan 11, 2025
- Sohar University Journal of Engineering and Information Technology Innovations
Large pollution impacts on human, animal, and plant health, along with advanced computing technologies capable of managing big data, create new opportunities for applying ML to improve air quality observation. Questions also continue to increase as more are created about how the performance of newer, hybrid ML models is matched to a particular application for the most suitable ML model. This paper presents a systematic review of state-of-the-art studies that implement ML techniques in the context of PM2.5 concentration prediction, focusing on analyzing dataset size, hyperparameters, and preprocessing techniques to answer these questions. This review investigates some proposed ML techniques and models applied in Beijing by highlighting their main characteristics and relevant results. They then pointed out that hybrid models are capable of uncovering the hidden features of data, which was not possible by single approaches with high dimensions. Another conclusion was drawn that air pollution prediction models have to be compared under the same conditions with the same future characteristics.
- Research Article
74
- 10.1016/j.jappgeo.2022.104605
- Mar 13, 2022
- Journal of Applied Geophysics
Lithology prediction from well log data using machine learning techniques: A case study from Talcher coalfield, Eastern India
- Research Article
1
- 10.1002/jhm.13078
- Mar 13, 2023
- Journal of Hospital Medicine
Methodological progress note: Machine learning methods in healthcare research.
- Research Article
23
- 10.1016/j.ijmedinf.2022.104758
- Apr 2, 2022
- International journal of medical informatics
Machine learning models for diabetes management in acute care using electronic medical records: A systematic review
- Research Article
9
- 10.14778/3236187.3269462
- Jul 1, 2018
- Proceedings of the VLDB Endowment
Machine learning has become an essential toolkit for complex analytic processing. Data is typically stored in large data warehouses with multiple dimension hierarchies. Often, data used for building an ML model are aligned on OLAP hierarchies such as location or time. In this paper, we investigate the feasibility of efficiently constructing approximate ML models for new queries from previously constructed ML models by leveraging the concepts of model materialization and reuse . For example, is it possible to construct an approximate ML model for data from the year 2017 if one already has ML models for each of its quarters? We propose algorithms that can support a wide variety of ML models such as generalized linear models for classification along with K-Means and Gaussian Mixture models for clustering. We propose a cost based optimization framework that identifies appropriate ML models to combine at query time and conduct extensive experiments on real-world and synthetic datasets. Our results indicate that our framework can support analytic queries on ML models, with superior performance, achieving dramatic speedups of several orders in magnitude on very large datasets.
- Research Article
6
- 10.14778/3236187.3236199
- Jul 1, 2018
- Proceedings of the VLDB Endowment
Machine learning has become an essential toolkit for complex analytic processing. Data is typically stored in large data warehouses with multiple dimension hierarchies. Often, data used for building an ML model are aligned on OLAP hierarchies such as location or time. In this paper, we investigate the feasibility of efficiently constructing approximate ML models for new queries from previously constructed ML models by leveraging the concepts of model materialization and reuse . For example, is it possible to construct an approximate ML model for data from the year 2017 if one already has ML models for each of its quarters? We propose algorithms that can support a wide variety of ML models such as generalized linear models for classification along with K-Means and Gaussian Mixture models for clustering. We propose a cost based optimization framework that identifies appropriate ML models to combine at query time and conduct extensive experiments on real-world and synthetic datasets. Our results indicate that our framework can support analytic queries on ML models, with superior performance, achieving dramatic speedups of several orders in magnitude on very large datasets.
- Conference Article
3
- 10.5555/3236187.3269462
- Jul 1, 2018
Machine learning has become an essential toolkit for complex analytic processing. Data is typically stored in large data warehouses with multiple dimension hierarchies. Often, data used for building an ML model are aligned on OLAP hierarchies such as location or time. In this paper, we investigate the feasibility of efficiently constructing approximate ML models for new queries from previously constructed ML models by leveraging the concepts of model materialization and reuse. For example, is it possible to construct an approximate ML model for data from the year 2017 if one already has ML models for each of its quarters? We propose algorithms that can support a wide variety of ML models such as generalized linear models for classification along with K-Means and Gaussian Mixture models for clustering. We propose a cost based optimization framework that identifies appropriate ML models to combine at query time and conduct extensive experiments on real-world and synthetic datasets. Our results indicate that our framework can support analytic queries on ML models, with superior performance, achieving dramatic speedups of several orders in magnitude on very large datasets.
- Research Article
10
- 10.2196/33049
- Dec 8, 2021
- JMIR Medical Informatics
BackgroundDeep learning (DL)–based artificial intelligence may have different diagnostic characteristics than human experts in medical diagnosis. As a data-driven knowledge system, heterogeneous population incidence in the clinical world is considered to cause more bias to DL than clinicians. Conversely, by experiencing limited numbers of cases, human experts may exhibit large interindividual variability. Thus, understanding how the 2 groups classify given data differently is an essential step for the cooperative usage of DL in clinical application.ObjectiveThis study aimed to evaluate and compare the differential effects of clinical experience in otoendoscopic image diagnosis in both computers and physicians exemplified by the class imbalance problem and guide clinicians when utilizing decision support systems.MethodsWe used digital otoendoscopic images of patients who visited the outpatient clinic in the Department of Otorhinolaryngology at Severance Hospital, Seoul, South Korea, from January 2013 to June 2019, for a total of 22,707 otoendoscopic images. We excluded similar images, and 7500 otoendoscopic images were selected for labeling. We built a DL-based image classification model to classify the given image into 6 disease categories. Two test sets of 300 images were populated: balanced and imbalanced test sets. We included 14 clinicians (otolaryngologists and nonotolaryngology specialists including general practitioners) and 13 DL-based models. We used accuracy (overall and per-class) and kappa statistics to compare the results of individual physicians and the ML models.ResultsOur ML models had consistently high accuracies (balanced test set: mean 77.14%, SD 1.83%; imbalanced test set: mean 82.03%, SD 3.06%), equivalent to those of otolaryngologists (balanced: mean 71.17%, SD 3.37%; imbalanced: mean 72.84%, SD 6.41%) and far better than those of nonotolaryngologists (balanced: mean 45.63%, SD 7.89%; imbalanced: mean 44.08%, SD 15.83%). However, ML models suffered from class imbalance problems (balanced test set: mean 77.14%, SD 1.83%; imbalanced test set: mean 82.03%, SD 3.06%). This was mitigated by data augmentation, particularly for low incidence classes, but rare disease classes still had low per-class accuracies. Human physicians, despite being less affected by prevalence, showed high interphysician variability (ML models: kappa=0.83, SD 0.02; otolaryngologists: kappa=0.60, SD 0.07).ConclusionsEven though ML models deliver excellent performance in classifying ear disease, physicians and ML models have their own strengths. ML models have consistent and high accuracy while considering only the given image and show bias toward prevalence, whereas human physicians have varying performance but do not show bias toward prevalence and may also consider extra information that is not images. To deliver the best patient care in the shortage of otolaryngologists, our ML model can serve a cooperative role for clinicians with diverse expertise, as long as it is kept in mind that models consider only images and could be biased toward prevalent diseases even after data augmentation.
- Research Article
14
- 10.1016/j.ebiom.2024.105006
- Feb 19, 2024
- eBioMedicine
Decentralised, collaborative, and privacy-preserving machine learning for multi-hospital data
- Conference Article
1
- 10.4043/32527-ms
- Apr 24, 2023
Machine learning (ML) models offer intriguing alternatives for multiphase pipe flow simulations. Certain subsets of ML algorithms are computationally robust and may outperform physics-based models when applied within the training range. However, they tend to deteriorate on extrapolations, which are exceedingly common for multiphase flow applications at the industrial scale. "Hybrid" (a combination of ML and physics-based) models conceptually combine the strengths of the physics-based (extrapolability and interpretability) and ML models (adaptability and computational simplicity). In this paper, the author presents an accuracy comparison between a "pure" ML model, a hybrid model, and a high-definition or high-fidelity physics-based model (HD) in a multiphase flow application, which illustrates the benefits and drawbacks of each modeling option. The author implemented two data-driven models to predict the liquid holdup in gas-liquid stratified flow in pipes: a pure ML and a hybrid model. Their accuracies are benchmarked against an HD stratified flow model. The pure ML model uses a neural network (NN) to predict liquid holdup directly. The hybrid model involves a 1D steady-state, fully developed, two-fluid conservation equations, coupled with NN to predict the interfacial friction. The HD model couples the aforementioned conservation equations with a preintegrated 2D velocity profile model, offering a physically self-consistent friction model for fluid-wall and fluid-fluid interfaces. The author collected more than 7,000 laboratory data points from various sources and split them [into training, cross validation (CV), and testing sets] in multiple ways. The splitting mechanism is a unique feature of this paper. The first split ensures the training and testing sets share similar characteristics while the others intentionally impose extrapolation between the two sets. The hybrid model is shown to be more scalable than the pure ML model, albeit performing worse on training. It is also worth noting that the inclusion of physics may reduce the size of relevant training data. The use of dimensionless features improves the pure ML model's extrapolability, although the hybrid model remains superior. The HD model is more accurate and consistent across different data sets than the hybrid model, indicating that it is not always straightforward to reduce the physics to a minimum and task an ML model to compensate for the loss. Furthermore, the inclusion of physics seems to reduce model susceptibility to data noise. The author concludes that physics-based model development remains imperative for advancing the multiphase flow modeling state-of-the-art. In this paper, the author discusses the potentials and challenges for a possible hybrid modeling scheme, in which ML is used as a substitute for a key closure for the physics-based model. This paper can serve as a valuable case study in engineering applications where ML implementation best-practices or workflows are not established yet, such as in multiphase pipe flow or flow assurance.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.