A Machine Learning Approach for Gas Kick Identification

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Summary Warning signs of a possible kick during drilling operations can either be primary (flow rate increase and pit gain) or secondary (drilling break and pump pressure decrease). Drillers rely on pressure data at the surface to determine in-situ downhole conditions while drilling. The surface pressure reading is always available and accessible. However, understanding or interpretation of this data is often ambiguous. This study analyzes significant kick symptoms in the wellbore annulus both under static (shut in) and dynamic (drilling/circulating) conditions. We used both supervised and unsupervised learning techniques for flow regime identification and kick prognosis. These include an artificial neural network (ANN), support vector machine (SVM), K-nearest neighbor (KNN), decision trees, K-means clustering, and agglomerative clustering. We trained these machine learning models to detect kick symptoms from the gas evolution data collected between the point of kick initiation and the wellhead. All the machine learning techniques used in this work made excellent predictions with accuracy greater than or equal to 90%. For the supervised learning, the decision tree gave the overall best results, with an accuracy of 96% for air influx cases and 98% for carbon dioxide influx cases in both static and dynamic scenarios. For unsupervised learning, K-means clustering was the best, with Silhouette scores ranging from about 0.4 to 0.8. The mass rate per hydraulic diameter and the mixture viscosity yielded the best types of clusters. This is because they account for the fluid properties, flow rate, and flow geometry. Although computationally demanding, the machine learning models can use the surface/downhole pressure data to relay annular flow patterns while drilling. There have been several recent advances in drilling automation. However, this is still limited to gas kick identification and handling. This work provides an alternative and easily accessible primary kick detection tool for drillers based on data at the surface. It also relates this surface data to certain annular flow regime patterns to better tell the downhole story while drilling.

Similar Papers
  • Conference Article
  • Cite Count Icon 4
  • 10.4043/31901-ms
Flow Pattern, Pressure Gradient Relationship of Gas Kick Under Dynamic Conditions
  • Apr 25, 2022
  • Chinemerem Edmond Obi + 4 more

The warning signs of possible kick during drilling operation can either be primary (flow rate increase and pit gain) or secondary (drilling break, pump pressure decrease, and stroke increase). Likewise, the drillers rely on the pressure readings at the surface to have an insight into in-situ downhole conditions while drilling. The surface pressure reading is always available and accessible. However, understanding or interpretation of this data is often ambiguous. This study analyses significant kick symptoms in the wellbore annulus while drilling/circulating. We have tied several observed annular flow patterns to the measured pressure, and flow data from the surface during water-air, and water-carbon dioxide complex flow. This is based on experiments using a 140 ft high tower lab, with a hydraulic diameter of about 3 in. The experiments have been carried out under dynamic conditions to simulate circulating drilling mud from the wellbore. We used both supervised and unsupervised learning techniques for flow regime identification and kick prognosis. These include an Artificial Neural Network (ANN), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Decision Trees, K-Means and Agglomerative Clustering. All the machine learning techniques used in this work made excellent predictions with accuracy greater than or equal to 90%. For the supervised learning, the decision tree gave the overall best results with an accuracy of 96% for air-influx cases and 98% for carbon dioxide influx cases. For the unsupervised learning, K-Means clustering was the best, with Silhouette scores ranging from about 0.7 to 0.8 for the rate data clusters, and 0.4 to 0.5 for pressure data clusters. The mass rate per hydraulic diameter and the mixture viscosity also resulted in the best type of clusters. This is because this approach accounts for the fluid properties, flow rate, and flow geometry. The estimation of the influx size and type is highly dependent on the duration of kick and the overbalance kick influx pressure. The quantity of the mass influx significantly controls the flow pattern, pressure losses, and pressure gradient as the kick migrates to the surface. The resulting turbulent flow after the initial kick (After Taylor bubble flow) varied with duration of kick, average liquid flow rate, influx type, and drilling scenario. Surface pressure readings can be tied to flow regime to better visualize well control approach while drilling. This works provides an alternative and easily accessible primary kick detection tool for drillers based on measured pressure responses at the surface. It also relates this pressure data to certain annular flow regime patterns to better tell the downhole story while drilling.

  • Conference Article
  • Cite Count Icon 10
  • 10.2118/209333-ms
A Machine Learning Analysis to Relate Flow Pattern and Pressure Gradient During Gas Kicks Under Static Conditions
  • Apr 19, 2022
  • Chinemerem Edmond Obi + 4 more

Warning signs of possible kick during drilling operation can either be primary (flow rate increase and pit gain) or secondary (drilling break, pump pressure decrease,). Drillers rely on pressure data at the surface to determine in-situ downhole conditions while drilling. The surface pressure reading is always available and accessible. However, understanding or interpretation of this data is often ambiguous. This study analyses significant kick symptoms in the wellbore annulus while under shut-in conditions. We have tied several observed annular flow patterns to the measured pressure gradient during water- air, and water-carbon dioxide complex flow. This is based on experiments in a 140-ft high flow loop, with a hydraulic diameter of approximately 3 in. The experiments were carried out under static conditions to simulate kick occurrence when the drilling fluid is not flowing, typically the wellbore is shut-in. We used an Artificial Neural Network (ANN) and K-Means clustering approach for kick prognosis. We trained these Machine learning models to detect kick symptoms from pressure response and gas evolution data collected between the kick occurrence and the Wellhead. The Artificial Neural Network (ANN) approach was relatively fast with a negligible difference in accuracy when compared for air influx and carbon dioxide influx for kick prognosis. The ANN resulted in an accuracy of about 90% and 93% for air-based kick prognosis. While the accuracy was 92% and 94% for carbon dioxide-based influx. With K-mean clustering, the Silhouette score were 0.5 and 0.6 for the air and carbon dioxide influx respectively. The estimation of the influx size and type is strongly a function of the duration of kick and bottom hole underbalanced pressure. Based on visual analysis, pit gain, and pressure signals, the quantity of the mass influx significantly controls the flow pattern, pressure losses, and pressure gradient as the kick migrates to the surface. The resulting turbulent flow after the initial kick (After Taylor bubble flow) varied with duration of kick, average liquid flow rate, influx type, and drilling scenario. We have tied the surface pressure readings to the flow regimes to better visualize well control approach while drilling. This is based on relating the significant kick symptoms we observed to the pressure readings at multiple locations and time, then training the Deep learning models based on this data. Although computationally demanding, the Deep-Learning model can use the surface pressure data to relay annular flow patterns while drilling. This work provides an alternative and relatively accessible primary kick detection tool for drillers based on measured pressure responses at the surface.

  • Research Article
  • Cite Count Icon 11
  • 10.1007/s13239-024-00737-y
Review of Machine Learning Techniques in Soft Tissue Biomechanics and Biomaterials.
  • Jul 2, 2024
  • Cardiovascular engineering and technology
  • Samir Donmazov + 3 more

Advanced material models and material characterization of soft biological tissues play an essential role in pre-surgical planning for vascular surgeries and transcatheter interventions. Recent advances in heart valve engineering, medical device and patch design are built upon these models. Furthermore, understanding vascular growth and remodeling in native and tissue-engineered vascular biomaterials, as well as designing and testing drugs on soft tissue, are crucial aspects of predictive regenerative medicine. Traditional nonlinear optimization methods and finite element (FE) simulations have served as biomaterial characterization tools combined with soft tissue mechanics and tensile testing for decades. However, results obtained through nonlinear optimization methods are reliable only to a certain extent due to mathematical limitations, and FE simulations may require substantial computing time and resources, which might not be justified for patient-specific simulations. To a significant extent, machine learning (ML) techniques have gained increasing prominence in the field of soft tissue mechanics in recent years, offering notable advantages over conventional methods. This review article presents an in-depth examination of emerging ML algorithms utilized for estimating the mechanical characteristics of soft biological tissues and biomaterials. These algorithms are employed to analyze crucial properties such as stress-strain curves and pressure-volume loops. The focus of the review is on applications in cardiovascular engineering, and the fundamental mathematical basis of each approach is also discussed. The review effort employed two strategies. First, the recent studies of major research groups actively engaged in cardiovascular soft tissue mechanics are compiled, and research papers utilizing ML and deep learning (DL) techniques were included in our review. The second strategy involved a standard keyword search across major databases. This approach provided 11 relevant ML articles, meticulously selected from reputable sources including ScienceDirect, Springer, PubMed, and Google Scholar. The selection process involved using specific keywords such as "machine learning" or "deep learning" in conjunction with "soft biological tissues", "cardiovascular", "patient-specific," "strain energy", "vascular" or "biomaterials". Initially, a total of 25 articles were selected. However, 14 of these articles were excluded as they did not align with the criteria of focusing on biomaterials specifically employed for soft tissue repair and regeneration. As a result, the remaining 11 articles were categorized based on the ML techniques employed and the training data utilized. ML techniques utilized for assessing the mechanical characteristics of soft biological tissues and biomaterials are broadly classified into two categories: standard ML algorithms and physics-informed ML algorithms. The standard ML models are then organized based on their tasks, being grouped into Regression and Classification subcategories. Within these categories, studies employ various supervised learning models, including support vector machines (SVMs), bagged decision trees (BDTs), artificial neural networks (ANNs) or deep neural networks (DNNs), and convolutional neural networks (CNNs). Additionally, the utilization of unsupervised learning approaches, such as autoencoders incorporating principal component analysis (PCA) and/or low-rank approximation (LRA), is based on the specific characteristics of the training data. The training data predominantly consists of three types: experimental mechanical data, including uniaxial or biaxial stress-strain data; synthetic mechanical data generated through non-linear fitting and/or FE simulations; and image data such as 3D second harmonic generation (SHG) images or computed tomography (CT) images. The evaluation of performance for physics-informed ML models primarily relies on the coefficient of determination . In contrast, various metrics and error measures are utilized to assess the performance of standard ML models. Furthermore, our review includes an extensive examination of prevalent biomaterial models that can serve as physical laws for physics-informed ML models. ML models offer an accurate, fast, and reliable approach for evaluating the mechanical characteristics of diseased soft tissue segments and selecting optimal biomaterials for time-critical soft tissue surgeries. Among the various ML models examined in this review, physics-informed neural network models exhibit the capability to forecast the mechanical response of soft biological tissues accurately, even with limited training samples. These models achieve high values ranging from 0.90 to 1.00. This is particularly significant considering the challenges associated with obtaining a large number of living tissue samples for experimental purposes, which can be time-consuming and impractical. Additionally, the review not only discusses the advantages identified in the current literature but also sheds light on the limitations and offers insights into future perspectives.

  • Research Article
  • 10.2337/db22-1132-p
1132-P: Classification and Prediction of Diabetes Using Electronic Health Records and Wearable Devices for Clinical Decision Support
  • Jun 1, 2022
  • Diabetes
  • Andrew Shahidehpour + 4 more

Introduction: Diagnosis of T2DM necessitates clinical tests that are time-consuming and expensive. Machine learning (ML) techniques can accelerate the diagnosis and classification of T2DM and allow clinicians to personalize treatments based on blood glucose concentrations (BGC) , physical fitness (PF) , and diabetes distress patterns observed in daily life. Analyzing electronic health records (EHR) , physiological variables collected with wearable devices, and patient-reported outcomes (PROs) using ML techniques can lead to the development of clinical decision support tools that provide a comprehensive picture of an individual’s diabetes management needs. Methods: Clinical experimental data (n=85, F:40/M:45, HbA1c: 7.83±2.16; age: 57±7.72: BMI: 33.kg/m2±6.72; means and SDs) were used to identify clusters of subjects based on medical tests, and ML models were developed using readily measured data to classify new subjects to the identified clusters. Latent variable methods and k-means clustering were used to identify clusters based on HbA1c, physical performance tests, and PROs. ML models, including logistic regression (LR) and support vector machines (SVM) , were developed to assign new subjects to the identified clusters using the readily measured input variables from wearable devices, EHR and PROs. Results: Three distinct subject clusters were identified within the study cohort based on the descriptive variables. New subjects were assigned to the identified clusters with 87% and 91% accuracy for LR and SVM, respectively. Conclusion: EHRs, wearable device data, and PROs can be used to accurately and conveniently identify a person’s overall BGC, PF, and diabetes distress to aid in clinical decision-making. In the future, clinical decision support tools can be developed for personalized treatment suggestions based on cluster membership. Disclosure A.Shahidehpour: None. C.Fritschi: None. M.Rashid: None. A.Cinar: None. L.T.Quinn: n/a.

  • Research Article
  • Cite Count Icon 14
  • 10.1038/s41598-024-70983-6
Software defined networking based network traffic classification using machine learning techniques
  • Aug 29, 2024
  • Scientific Reports
  • Ayodeji Olalekan Salau + 1 more

The classification of network traffic has become increasingly crucial due to the rapid growth in the number of internet users. Conventional approaches, such as identifying traffic based on port numbers and payload inspection are becoming ineffective due to the dynamic and encrypted nature of modern network traffic. A number of researchers have implemented Software Defined Networking (SDN) based traffic classification using Machine Learning (ML) and Deep Learning (DL) models. However, the studies had various limitations such as encrypted traffic detection, payload inspection, poor detection accuracy, and challenges with testing models both in offline and real-time traffic modes. ML models together with SDN are adopted nowadays to enhance classification performance. In this paper, both supervised (Logistic Regression, Decision Tree, Random Forest, AdaBoost, and Support Vector Machine) and unsupervised (K-means clustering) ML models were used to classify Domain Name System (DNS), Telnet, Ping, and Voice traffic flows simulated using the Distributed Internet Traffic Generator (D-ITG) tool. The use of this tool effectively manages and classifies traffic types based on their application. The study discussed the dataset used, model selection, implementation of the model, and implementation techniques (such as pre-processing, feature extraction, ML algorithm, and model evaluation metrics). The proposed model in SDN was implemented in Mininet for designing the network architecture and generating network traffic. Anaconda Python environment was utilized for traffic classification using various ML techniques. Among the models tested, the Decision Tree supervised learning achieved the highest accuracy of 99.81%, outperforming other supervised and unsupervised learning algorithms. These results indicate that the integration of ML with SDN provides an efficient classification method for identifying and accurately classifying both offline and real-time network traffic, enhanced quality of service (QoS), detection of encrypted packets, deep packet inspection and management.

  • Research Article
  • Cite Count Icon 12
  • 10.3390/agronomy14123001
Machine Learning and Deep Learning for Crop Disease Diagnosis: Performance Analysis and Review
  • Dec 17, 2024
  • Agronomy
  • Habiba Njeri Ngugi + 2 more

Crop diseases pose a significant threat to global food security, with both economic and environmental consequences. Early and accurate detection is essential for timely intervention and sustainable farming. This paper presents a review of machine learning (ML) and deep learning (DL) techniques for crop disease diagnosis, focusing on Support Vector Machines (SVMs), Random Forest (RF), k-Nearest Neighbors (KNNs), and deep models like VGG16, ResNet50, and DenseNet121. The review method includes an in-depth analysis of algorithm performance using key metrics such as accuracy, precision, recall, and F1 score across various datasets. We also highlight the data imbalances in commonly used datasets, particularly PlantVillage, and discuss the challenges posed by these imbalances. The research highlights critical insights regarding ML and DL models in crop disease detection. A primary challenge identified is the imbalance in the PlantVillage dataset, with a high number of healthy images and a strong bias toward certain disease categories like fungi, leaving other categories like mites and molds underrepresented. This imbalance complicates model generalization, indicating a need for preprocessing steps to enhance performance. This study also shows that combining Vision Transformers (ViTs) with Green Chromatic Coordinates and hybridizing these with SVM achieves high classification accuracy, emphasizing the value of advanced feature extraction techniques in improving model efficacy. In terms of comparative performance, DL architectures like ResNet50, VGG16, and convolutional neural network demonstrated robust accuracy (95–99%) across diverse datasets, underscoring their effectiveness in managing complex image data. Additionally, traditional ML models exhibited varied strengths; for instance, SVM performed better on balanced datasets, while RF excelled with imbalanced data. Preprocessing methods like K-means clustering, Fuzzy C-Means, and PCA, along with ensemble approaches, further improved model accuracy. Lastly, the study underscores that high-quality, well-labeled datasets, stakeholder involvement, and comprehensive evaluation metrics such as F1 score and precision are crucial for optimizing ML and DL models, making them more effective for real-world applications in sustainable agriculture.

  • Research Article
  • Cite Count Icon 138
  • 10.1021/acs.jcim.1c01031
Benchmarking Machine Learning Models for Polymer Informatics: An Example of Glass Transition Temperature.
  • Oct 18, 2021
  • Journal of Chemical Information and Modeling
  • Lei Tao + 2 more

In the field of polymer informatics, utilizing machine learning (ML) techniques to evaluate the glass transition temperature Tg and other properties of polymers has attracted extensive attention. This data-centric approach is much more efficient and practical than the laborious experimental measurements when encountered a daunting number of polymer structures. Various ML models are demonstrated to perform well for Tg prediction. Nevertheless, they are trained on different data sets, using different structure representations, and based on different feature engineering methods. Thus, the critical question arises on selecting a proper ML model to better handle the Tg prediction with generalization ability. To provide a fair comparison of different ML techniques and examine the key factors that affect the model performance, we carry out a systematic benchmark study by compiling 79 different ML models and training them on a large and diverse data set. The three major components in setting up an ML model are structure representations, feature representations, and ML algorithms. In terms of polymer structure representation, we consider the polymer monomer, repeat unit, and oligomer with longer chain structure. Based on that feature, representation is calculated, including Morgan fingerprinting with or without substructure frequency, RDKit descriptors, molecular embedding, molecular graph, etc. Afterward, the obtained feature input is trained using different ML algorithms, such as deep neural networks, convolutional neural networks, random forest, support vector machine, LASSO regression, and Gaussian process regression. We evaluate the performance of these ML models using a holdout test set and an extra unlabeled data set from high-throughput molecular dynamics simulation. The ML model's generalization ability on an unlabeled data set is especially focused, and the model's sensitivity to topology and the molecular weight of polymers is also taken into consideration. This benchmark study provides not only a guideline for the Tg prediction task but also a useful reference for other polymer informatics tasks.

  • Research Article
  • Cite Count Icon 34
  • 10.1007/s00477-021-01982-6
Comprehensive evaluation of machine learning models for suspended sediment load inflow prediction in a reservoir
  • Feb 13, 2021
  • Stochastic Environmental Research and Risk Assessment
  • Muhammad Bilal Idrees + 3 more

Suspended sediment load (SSL) flowing into a reservoir contributes to the overall safety of dam. Owing to the complexity and stochastic nature of sedimentation, accurate prediction of reservoir SSL inflow is still challenging. Moreover, research and application of machine learning (ML) techniques for reservoir sedimentation are still deficient. A comprehensive evaluation of six ML models for a reservoir SSL inflow prediction was performed in this study. ML techniques including artificial neural network (ANN), adaptive neuro-fuzzy inference system (ANFIS), radial basis function neural network (RBFNN), support vector machine (SVM), genetic programming (GP), and deep learning (DL) were applied to develop predictive models of daily SSL inflow at Sangju Weir, South Korea. Significant input vectors for each model were selected with streamflow, water temperature, water stage, reservoir outflow for different time lags. Model performances were evaluated using various statistical indices including the coefficient of determination (R2), mean absolute error (MAE), percentage of bias (PBIAS), Willmott index (WI), Nash–Sutcliffe efficiency (NSE), root mean square error (RMSE), and Pearson correlation coefficient (PCC). The best input combinations were found to be unique for each ML model, but all six models performed reasonably well for SSL inflow predictions. ANN model outperformed other models with R2 = 0.821, MAE = 4.244 tons/day, PBIAS = 0.055, WI = 0.891, NSE = 0.991, RMSE = 11.692 tons/day, PCC = 0.826. The models were ranked based on their SSL prediction capabilities as ANN > ANFIS > DL > RBFNN > SVM > GP from best to worst. The findings are expected to be useful for future dam safety and risk assessment, and for achieving sustainability of reservoir operation through comprehensive sediment management.

  • Research Article
  • Cite Count Icon 2
  • 10.62713/aic.3485
Data-driven Machine Learning Models for Risk Stratification and Prediction of Emergence Delirium in Pediatric Patients Underwent Tonsillectomy/Adenotonsillectomy.
  • Oct 20, 2024
  • Annali italiani di chirurgia
  • Alessandro Simonini + 7 more

In the pediatric surgical population, Emergence Delirium (ED) poses a significant challenge. This study aims to develop and validate machine learning (ML) models to identify key features associated with ED and predict its occurrence in children undergoing tonsillectomy or adenotonsillectomy. The analysis involved data cleaning, exploratory data analysis (EDA), supervised predictive modeling, and unsupervised learning on a medical dataset (n = 423). After preliminary data cleaning, EDA encompassed plotting histograms, boxplots, pairplots, and correlation heatmaps to understand variable distributions and relationships. Four predictive models were trained including logistic regression (LR), random forest (RF), Support Vector Machine (SVM), and Gradient Boosting (XGBoost). The models were evaluated and compared using Receiver Operating Characteristic (ROC) Area Under the Curve (AUC), precision, recall, and feature importance. The RF model showed better performance and was used for the test (AUC-ROC 0.96, precision 1.00, and recall 0.92 on the validation set). K-means clustering was applied to find groups within the data. Elbow method and silhouette scores were used to determine the optimal number of clusters. The formed clusters were analyzed by aggregating features to understand the characteristics of each cluster. EDA revealed significant positive correlations between age, weight, American Society of Anesthesiologists (ASA) health score, and surgery duration with the risk of developing ED. Among the ML models, RF achieved the highest performance. Key predictive variables, based on the model's feature importance, included delirium screening scales, extubation time, and time to regain consciousness. Unsupervised K-means clustering identified 2-3 optimal clusters, which represented distinct patient subgroups: younger, healthier, low-risk individuals (cluster 0), and older patients with increasing chronic disease burden, higher delirium screening scores, and consequently higher post-operative delirium risk (clusters 1 and 2). ML techniques are valuable tools for extracting insights and making accurate predictions from healthcare data. High-performing algorithm-based models can be implemented for clinical decision support systems, facilitating early identification and intervention for ED in pediatric patients. By investigating various variables, it is possible to assess risk and implement preventive measures effectively. Furthermore, unsupervised clustering reveals distinct patient subgroups, enabling personalized perioperative management strategies and enhancing overall patient care.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 17
  • 10.3390/geosciences12120429
Machine Learning Techniques for Gully Erosion Susceptibility Mapping: A Review
  • Nov 22, 2022
  • Geosciences
  • Hamid Mohebzadeh + 3 more

Gully erosion susceptibility mapping (GESM) through predicting the spatial distribution of areas prone to gully erosion is required to plan gully erosion control strategies relevant to soil conservation. Recently, machine learning (ML) models have received increasing attention for GESM due to their vast capabilities. In this context, this paper sought to review the modeling procedure of GESM using ML models, including the required datasets and model development and validation. The results showed that elevation, slope, plan curvature, rainfall and land use/cover were the most important factors for GESM. It is also concluded that although ML models predict the locations of zones prone to gullying reasonably well, performance ranking of such methods is difficult because they yield different results based on the quality of the training dataset, the structure of the models, and the performance indicators. Among the ML techniques, random forest (RF) and support vector machine (SVM) are the most widely used models for GESM, which show promising results. Overall, to improve the prediction performance of ML models, the use of data-mining techniques to improve the quality of the dataset and of an ensemble estimation approach is recommended. Furthermore, evaluation of ML models for the prediction of other types of gully erosion, such as rill–interill and ephemeral gully should be the subject of more studies in the future. The employment of a combination of topographic indices and ML models is recommended for the accurate extraction of gully trajectories that are the main input of some process-based models.

  • Research Article
  • 10.1051/matecconf/202440002011
Generating Spatial Distribution and Forecasting the Rainfall by Suitable ML Models-A Case Study of Aiyar River Basin, Tiruchirappalli District
  • Jan 1, 2024
  • MATEC Web of Conferences
  • Surendar Natarajan + 1 more

Rainfall plays a prominent role in managing of water resources. The accurate prediction of rainfall is the greatest challenge in the field of hydrologic studies. The prediction of rainfall is necessary to overcome natural disasters like flood and drought. The inaccurate prediction of rainfall causes either dry or overflow in water storage structures. In this study different types of Machine Learning (ML) and deep learning techniques are adopted to predict rainfall pattern of Aiyar river basin, in Tiruchirappalli district. The comparative study of these ML models is done to identify the best ML model for the study area. The comparison was done for different scenarios and time intervals. The rainfall data from years 1987 to 2023 is used for predicting the daily rainfall in the basin. The rainfall data from years 1987 to 2007 is used for testing and the remaining years data is used for training the data set. The Theisen polygon method is used to average and weighted the rainfall data in the basin. The ML models and deep learning techniques used in this study are Linear model, Support Vector Machine (SVM) and Long Short-Term Memory (LSTM) models. The rainfall was predicted for different time scenario by using different ML algorithms like Autocorrelation method. The accuracy of the predicted model results was tested with RMSE, MASE and R square values. The result shows coefficient between 0.5 to 0.9 within the limit from the daily rainfall values. From the overall model comparison, it is observed that the SVM model accuracy is high compared to the other models involved in this study. It is concluded that two different methods ML and deep learning methods have been applied with same data in which SVM ML techniques gives better results in this study area. In future the predicted rainfall data of this study can be used for accurate flood forecasting and modelling of Aiyar basin.

  • Research Article
  • Cite Count Icon 27
  • 10.1016/j.eswa.2023.120649
Comparative study on the performance of different machine learning techniques to predict the shear strength of RC deep beams: Model selection and industry implications
  • Jun 3, 2023
  • Expert Systems with Applications
  • Khuong Le Nguyen + 3 more

Comparative study on the performance of different machine learning techniques to predict the shear strength of RC deep beams: Model selection and industry implications

  • Abstract
  • 10.1136/lupus-2023-kcr.123
LSO-081 Genomic prediction model using machine learning techniques that can distinguish autoimmune diseases (RA or SLE) from healthy controls
  • Jul 1, 2023
  • Lupus Science & Medicine
  • Young Bin Joo + 4 more

BackgroundRheumatoid arthritis (RA) and systemic lupus erythematosus (SLE) are the prototypes of autoimmune diseases, for which many genetic loci have been identified using genome-wide association studies (GWAS) for recent decades....

  • Research Article
  • Cite Count Icon 17
  • 10.1186/s12984-023-01140-9
The use of machine learning and deep learning techniques to assess proprioceptive impairments of the upper limb after stroke
  • Jan 27, 2023
  • Journal of NeuroEngineering and Rehabilitation
  • Delowar Hossain + 3 more

BackgroundRobots can generate rich kinematic datasets that have the potential to provide far more insight into impairments than standard clinical ordinal scales. Determining how to define the presence or absence of impairment in individuals using kinematic data, however, can be challenging. Machine learning techniques offer a potential solution to this problem. In the present manuscript we examine proprioception in stroke survivors using a robotic arm position matching task. Proprioception is impaired in 50–60% of stroke survivors and has been associated with poorer motor recovery and longer lengths of hospital stay. We present a simple cut-off score technique for individual kinematic parameters and an overall task score to determine impairment. We then compare the ability of different machine learning (ML) techniques and the above-mentioned task score to correctly classify individuals with or without stroke based on kinematic data.MethodsParticipants performed an Arm Position Matching (APM) task in an exoskeleton robot. The task produced 12 kinematic parameters that quantify multiple attributes of position sense. We first quantified impairment in individual parameters and an overall task score by determining if participants with stroke fell outside of the 95% cut-off score of control (normative) values. Then, we applied five machine learning algorithms (i.e., Logistic Regression, Decision Tree, Random Forest, Random Forest with Hyperparameters Tuning, and Support Vector Machine), and a deep learning algorithm (i.e., Deep Neural Network) to classify individual participants as to whether or not they had a stroke based only on kinematic parameters using a tenfold cross-validation approach.ResultsWe recruited 429 participants with neuroimaging-confirmed stroke (< 35 days post-stroke) and 465 healthy controls. Depending on the APM parameter, we observed that 10.9–48.4% of stroke participants were impaired, while 44% were impaired based on their overall task score. The mean performance metrics of machine learning and deep learning models were: accuracy 82.4%, precision 85.6%, recall 76.5%, and F1 score 80.6%. All machine learning and deep learning models displayed similar classification accuracy; however, the Random Forest model had the highest numerical accuracy (83%). Our models showed higher sensitivity and specificity (AUC = 0.89) in classifying individual participants than the overall task score (AUC = 0.85) based on their performance in the APM task. We also found that variability was the most important feature in classifying performance in the APM task.ConclusionOur ML models displayed similar classification performance. ML models were able to integrate more kinematic information and relationships between variables into decision making and displayed better classification performance than the overall task score. ML may help to provide insight into individual kinematic features that have previously been overlooked with respect to clinical importance.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.2196/46854
Prediction of Medical Disputes Between Health Care Workers and Patients in Terms of Hospital Legal Construction Using Machine Learning Techniques: Externally Validated Cross-Sectional Study
  • Aug 17, 2023
  • Journal of Medical Internet Research
  • Min Yi + 10 more

BackgroundMedical disputes are a global public health issue that is receiving increasing attention. However, studies investigating the relationship between hospital legal construction and medical disputes are scarce. The development of a multicenter model incorporating machine learning (ML) techniques for the individualized prediction of medical disputes would be beneficial for medical workers.ObjectiveThis study aimed to identify predictors related to medical disputes from the perspective of hospital legal construction and the use of ML techniques to build models for predicting the risk of medical disputes.MethodsThis study enrolled 38,053 medical workers from 130 tertiary hospitals in Hunan province, China. The participants were randomly divided into a training cohort (34,286/38,053, 90.1%) and an internal validation cohort (3767/38,053, 9.9%). Medical workers from 87 tertiary hospitals in Beijing were included in an external validation cohort (26,285/26,285, 100%). This study used logistic regression and 5 ML techniques: decision tree, random forest, support vector machine, gradient boosting decision tree (GBDT), and deep neural network. In total, 12 metrics, including discrimination and calibration, were used for performance evaluation. A scoring system was developed to select the optimal model. Shapley additive explanations was used to generate the importance coefficients for characteristics. To promote the clinical practice of our proposed optimal model, reclassification of patients was performed, and a web-based app for medical dispute prediction was created, which can be easily accessed by the public.ResultsMedical disputes occurred among 46.06% (17,527/38,053) of the medical workers in Hunan province, China. Among the 26 clinical characteristics, multivariate analysis demonstrated that 18 characteristics were significantly associated with medical disputes, and these characteristics were used for ML model development. Among the ML techniques, GBDT was identified as the optimal model, demonstrating the lowest Brier score (0.205), highest area under the receiver operating characteristic curve (0.738, 95% CI 0.722-0.754), and the largest discrimination slope (0.172) and Youden index (1.355). In addition, it achieved the highest metrics score (63 points), followed by deep neural network (46 points) and random forest (45 points), in the internal validation set. In the external validation set, GBDT still performed comparably, achieving the second highest metrics score (52 points). The high-risk group had more than twice the odds of experiencing medical disputes compared with the low-risk group.ConclusionsWe established a prediction model to stratify medical workers into different risk groups for encountering medical disputes. Among the 5 ML models, GBDT demonstrated the optimal comprehensive performance and was used to construct the web-based app. Our proposed model can serve as a useful tool for identifying medical workers at high risk of medical disputes. We believe that preventive strategies should be implemented for the high-risk group.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.