Machine learning-based predictive maintenance system for urban heating networks for real-time failure detection and analysis
This study presents a comprehensive framework for predictive maintenance of urban heat supply networks utilizing advanced machine learning algorithms. The primary objective is to enable early detection of potential system failures, thereby improving operational reliability and minimizing unplanned downtimes. A synthetically generated dataset of 10,000 records was employed, simulating real – world operational parameters such as temperature, pressure, flow rate, and vibration, sampled at 5–minute intervals to replicate actual monitoring conditions. Data preprocessing involved outlier removal using the interquartile range (IQR) method, normalization through Min-Max scaling, and imputation of missing values, ensuring data quality and consistency. Feature importance was further analyzed using SHAP values to enhance interpretability and identify critical predictors influencing system behavior. Five machine learning models – Logistic Regression, Support Vector Machine (SVM), Random Forest, Artificial Neural Networks (ANN), and Gradient Boosting (LightGBM) – were implemented and evaluated using 10 – fold cross – validation. The Gradient Boosting model demonstrated superior performance, achieving an accuracy of 99.9%, F1-score of 0.999, ROC-AUC of 1.0, and LogLoss of 0.004. Logistic Regression and Random Forest also performed well (AUC = 1.0, F1 = 0.999), whereas SVM and ANN exhibited limited predictive capabilities (AUC ≈ 0.50, F1 = 0.038 and 0.632, respectively). These results underscore the robustness of Gradient Boosting in modeling complex nonlinear relationships and its applicability for real-time anomaly detection in heating systems. The proposed framework holds significant practical potential for integration into existing monitoring infrastructures, facilitating proactive maintenance planning, optimizing resource allocation, and reducing operational costs. Future research will focus on validating the approach with real – world datasets and exploring hybrid machine learning architectures to enhance model generalizability and resilience.
- Research Article
- 10.1038/s41598-025-09423-y
- Jul 8, 2025
- Scientific Reports
The precise diagnosis of heart disease represents a significant obstacle within the medical field, demanding the implementation of advanced diagnostic instruments and methodologies. This article conducts an extensive examination of the efficacy of different machine learning (ML) and deep learning (DL) models in forecasting heart disease using tabular dataset, with a particular focus on a binary classification task. An extensive array of preprocessing techniques is thoroughly examined in order to optimize the predictive models’ quality and performance. Our study employs a wide range of ML algorithms, such as Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), K-Nearest Neibors (KNN), AdaBoost (AB), Gradient Boosting Machine (GBM), Light Gradient Boosting Machine (LGBM), CatBoost (CB), Linear Discriminant Analysis (LDA), and Artificial Neural Network (ANN) to assess the predictive performance of these algorithms in the context of heart disease detection. By subjecting the ML models to exhaustive experimentation, this study evaluates the effects of different feature scaling, namely standardization, minmax scaling, and normalization technique on their performance. The assessment takes into account various parameters including accuracy (Acc), precision (Pre), recall (Rec), F1 score (F1), Area Under Curve (AUC), Cohen’s Kappa (CK)and Logloss. The results of this research not only illuminate the optimal scaling methods and ML models for forecasting heart disease, but also offer valuable perspectives on the pragmatic ramifications of implementing these models within a healthcare environment. The research endeavors to make a scholarly contribution to the field of cardiology by utilizing predictive analytics to pave the way for improved early detection and diagnosis of heart disease. This is critical information for coordinating treatment and ensuring opportune intervention.
- Research Article
8
- 10.1097/txd.0000000000001212
- Sep 27, 2021
- Transplantation Direct
Several machine learning classifiers were trained to predict transplantation of a liver graft. We utilized 127 variables available in the DMG dataset. We included data from potential deceased organ donors between April 2012 and January 2019. The outcome was defined as liver recovery for transplantation in the operating room. The prediction was made based on data available 12-18 h after the time of authorization for transplantation. The data were randomly separated into training (60%), validation (20%), and test sets (20%). We compared the performance of our models to the Liver Discard Risk Index. Of 13 629 donors in the dataset, 9255 (68%) livers were recovered and transplanted, 1519 recovered but used for research or discarded, 2855 were not recovered. The optimized gradient boosting machine classifier achieved an area under the curve of the receiver operator characteristic of 0.84 on the test set, outperforming all other classifiers. This model predicts successful liver recovery for transplantation in the operating room, using data available early during donor management. It performs favorably when compared to existing models. It may provide real-time decision support during organ donor management and transplant logistics.
- Research Article
67
- 10.1007/s11657-020-00802-8
- Oct 23, 2020
- Archives of Osteoporosis
Osteoporosis is a silent disease until it results in fragility fractures. However, early diagnosis of osteoporosis provides an opportunity to detect and prevent fractures. We aimed to develop machine learning approaches to achieve high predictive ability for osteoporosis risk that could help primary care providers identify which women are at increased risk of osteoporosis and should therefore undergo further testing with bone densitometry. We included all postmenopausal Korean women from the Korea National Health and Nutrition Examination Surveys (KNHANES V-1, V-2) conducted in 2010 and 2011. Machine learning models using methods such as the k-nearest neighbors (KNN), decision tree (DT), random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), artificial neural networks (ANN), and logistic regression (LR) were developed to predict osteoporosis risk. We analyzed the effect of applying the machine learning algorithms to the raw data and featuring the selected data only where the statistically significant variables were included as model inputs. The accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC) were used to evaluate performance among the seven models. A total of 1792 patients were included in this study, of which 613 had osteoporosis. The raw data consisted of 19 variables and achieved performances (in terms of AUROCs) of 0.712, 0.684, 0.727, 0.652, 0.724, 0.741, and 0.726 for KNN, DT, RF, GBM, SVM, ANN, and LR with fivefold cross-validation, respectively. The feature selected data consisted of nine variables and achieved performances (in terms of AUROCs) of 0.713, 0.685, 0.734, 0.728, 0.728, 0.743, and 0.727 for KNN, DT, RF, GBM, SVM, ANN, and LR with fivefold cross-validation, respectively. In this study, we developed and compared seven machine learning models to accurately predict osteoporosis risk. The ANN model performed best when compared to the other models, having the highest AUROC value. Applying the ANN model in the clinical environment could help primary care providers stratify osteoporosis patients and improve the prevention, detection, and early treatment of osteoporosis.
- Research Article
29
- 10.1186/s12877-022-03502-9
- Oct 13, 2022
- BMC Geriatrics
BackgroundWith rapid economic development, the world's average life expectancy is increasing, leading to the increasing prevalence of osteoporosis worldwide. However, due to the complexity and high cost of dual-energy x-ray absorptiometry (DXA) examination, DXA has not been widely used to diagnose osteoporosis. In addition, studies have shown that the psoas index measured at the third lumbar spine (L3) level is closely related to bone mineral density (BMD) and has an excellent predictive effect on osteoporosis. Therefore, this study developed a variety of machine learning (ML) models based on psoas muscle tissue at the L3 level of unenhanced abdominal computed tomography (CT) to predict osteoporosis.MethodsMedical professionals collected the CT images and the clinical characteristics data of patients over 40 years old who underwent DXA and abdominal CT examination in the Second Affiliated Hospital of Wenzhou Medical University database from January 2017 to January 2021. Using 3D Slicer software based on horizontal CT images of the L3, the specialist delineated three layers of the region of interest (ROI) along the bilateral psoas muscle edges. The PyRadiomics package in Python was used to extract the features of ROI. Then Mann–Whitney U test and the least absolute shrinkage and selection operator (LASSO) algorithm were used to reduce the dimension of the extracted features. Finally, six machine learning models, Gaussian naïve Bayes (GNB), random forest (RF), logistic regression (LR), support vector machines (SVM), Gradient boosting machine (GBM), and Extreme gradient boosting (XGBoost), were applied to train and validate these features to predict osteoporosis.ResultsA total of 172 participants met the inclusion and exclusion criteria for the study. 82 participants were enrolled in the osteoporosis group, and 90 were in the non-osteoporosis group. Moreover, the two groups had no significant differences in age, BMI, sex, smoking, drinking, hypertension, and diabetes. Besides, 826 radiomic features were obtained from unenhanced abdominal CT images of osteoporotic and non-osteoporotic patients. Five hundred fifty radiomic features were screened out of 826 by the Mann–Whitney U test. Finally, 16 significant radiomic features were obtained by the LASSO algorithm. These 16 radiomic features were incorporated into six traditional machine learning models (GBM, GNB, LR, RF, SVM, and XGB). All six machine learning models could predict osteoporosis well in the validation set, with the area under the receiver operating characteristic (AUROC) values greater than or equal to 0.8. GBM is more effective in predicting osteoporosis, whose AUROC was 0.86, sensitivity 0.70, specificity 0.92, and accuracy 0.81 in validation sets.ConclusionWe developed six machine learning models to predict osteoporosis based on psoas muscle images of abdominal CT, and the GBM model had the best predictive performance. GBM model can better help clinicians to diagnose osteoporosis and provide timely anti-osteoporosis treatment for patients. In the future, the research team will strive to include participants from multiple institutions to conduct external validation of the ML model of this study.
- Preprint Article
- 10.5194/egusphere-egu25-10556
- Mar 18, 2025
Taiwan, situated at the junction of the Ryukyu Arc and the Philippine Arc, is prone to frequent seismic activities due to its position at the boundary of tectonic plates. Earthquake-induced landslides, therefore, are one of the most common geological hazards. For disaster mitigation, it is crucial to accurately predict the spatial distribution of such landslides after earthquake occurrence. This study revolves around assessing the landslide risks triggered by the April 3rd, 2024, Hualien earthquake, which caused tremendous damage and claimed 18 lives, using multiple machine learning models, including Random Forest (RF), Support Vector Machines (SVM), Gradient Boosting Machine (GBM), and K-Nearest Neighbors (KNN). However, Logistic Regression (LR) was undiscussed in this study due to its disaster prediction limitations. While LR is advantageous when handling small datasets with limited independent variables, it faces significant drawbacks in high-dimensional and multi-variable scenarios. Moreover, the simplistic structure of LR tends to result in underfitting, causing inferior predictive performance. Furthermore, when dealing with large-scale data, the process becomes computationally intensive for LR. In contrast, machine learning models like RF, SVM, and GBM, along with ensemble techniques, are better suited for addressing the complexity of earthquake-induced landslide prediction.The models were trained using a dataset comprising 3191 data points, including various topographic, geological, and seismic variables such as slope-related factors, curvature, elevation, aspect, lithology, peak ground acceleration (PGA), peak ground velocity (PGV), and distances to nearby faults and rivers. The dataset was labeled into two categories: coseismic landslide (CL) data labeled as 1 and non-coseismic landslide (NCL) data labeled as 0. To train and evaluate the models, the dataset was divided into two subsets: 70% was used as the training set to build and fine-tune the models, while the remaining served as the test set to assess their predictive performance. The confusion matrices of the four models were the basis for comparing their performance. All models’ accuracy exceeds 0.95. Among them, the SVM model reached the highest at 0.9822, followed by GBM (0.9702), RF (0.9697), and KNN (0.9530). The greater performance of SVM can be attributed to its ability to handle high-dimensional and nonlinear data more effectively, using kernel functions to transform the feature space and maximize the margin between classes, enhancing its classification precision and generalization capability.To further enhance prediction reliability, an ensemble model was developed by integrating the RF, SVM, and GBM models, while the KNN model, showing the lowest accuracy, was excluded, ensuring the number of the models was odd. The final prediction of the ensemble model was voted by the outcome of the three models, substantially reducing prediction errors.Compared to logistic regression models, the ensemble approach is more dependable. While logistic regression struggles with high-dimensional, non-linear, and strongly correlated geophysical variables, the ensemble model formed by three machine learning models (RF, SVM, and GBM) combines their strengths to tackle these challenges. By leveraging the models’ diversity, the ensemble reduces overfitting and enhances the robustness of predictions, highlighting the ensemble model’s capability in addressing the complexities of coseismic landslide prediction.
- Research Article
8
- 10.12989/gae.2021.25.1.001
- Jan 1, 2021
- Geomechanics and Engineering
Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The large number of inputs or conditioning factors for these models, however, can reduce the computation efficiency and increase the difficulty in collecting data. Feature selection is a good tool to address this problem by selecting the most important features among all factors to reduce the size of the input variables. However, two important questions need to be solved: (1) how do feature selection methods affect the performance of machine learning models? and (2) which feature selection method is the most suitable for a given machine learning model? This paper aims to address these two questions by comparing the predictive performance of 13 feature selection-based machine learning (FS-ML) models and 5 ordinary machine learning models on LSA. First, five commonly used machine learning models (i.e., logistic regression, support vector machine, artificial neural network, Gaussian process and random forest) and six typical feature selection methods in the literature are adopted to constitute the proposed models. Then, fifteen conditioning factors are chosen as input variables and 1,017 landslides are used as recorded data. Next, feature selection methods are used to obtain the importance of the conditioning factors to create feature subsets, based on which 13 FS-ML models are constructed. For each of the machine learning models, a best optimized FS-ML model is selected according to the area under curve value. Finally, five optimal FS-ML models are obtained and applied to the LSA of the studied area. The predictive abilities of the FS-ML models on LSA are verified and compared through the receive operating characteristic curve and statistical indicators such as sensitivity, specificity and accuracy. The results showed that different feature selection methods have different effects on the performance of LSA machine learning models. FS-ML models generally outperform the ordinary machine learning models. The best FS-ML model is the recursive feature elimination (RFE) optimized RF, and RFE is an optimal method for feature selection.
- Preprint Article
- 10.5194/ems2025-562
- Jul 16, 2025
Machine learning (ML) and deep learning (DL) models can play an important role when it comes to modelling complicated processes. Such capability is necessary for hydrological and climate-related applications. Generally, ML models utilize precipitation and temperature time series of a basin as input to develop a lumped rainfall-runoff model to simulate streamflow at the basin outlet. However, when it is divided into several sub-basins, Graph Neural Networks (GNN) can consider each sub-basin as a node and link them together using a connectivity matrix to account for spatial variations of hydroclimatic variables. In this study, GNN and various ML models with different types of architecture, ranging from neural networks, tree-based structure, and gradient boosting, were exploited for daily streamflow simulation over different case studies. For each case study, the basin was divided into a few sub-basins for which daily precipitation and temperature data were aggregated and used as input. For training GNN, the connection matrix of sub-basins was also used as input. Basically, 75% of historical records were utilized to train GNN and different ML models, e.g., artificial neural networks, support vector machine, decision tree, random forest, eXtreme Gradient Boosting (XGBoost), Light Gradient-Boosting Machine (LightGBM), and Category Boosting (CatBoost), while the rest was used for testing. Streamflow simulation was conducted with/without considering seasonality impact and lag times. The obtained results clearly demonstrate that considering seasonality and time lags can enhance accuracy of streamflow predictions based on Kling–Gupta efficiency (KGE). Furthermore, GNN with seasonality impact and time lags achieved promising results across different case studies with KGE>0.85 for training and KGE>0.59 for testing data, respectively. Among ML models, boosting models, e.g., LightGBM and XGBoost, performed slightly better than other ML models. for Finally, this comparative analysis provides valuable insights for ML/DL applications in climate change impact assessments.Acknowledgements: This research work was carried out as part of the TRANSCEND project with funding received from the European Union Horizon Europe Research and Innovation Programme under Grant Agreement No. 10108411.
- Research Article
2
- 10.28945/4897
- Jan 1, 2022
- Interdisciplinary Journal of Information, Knowledge, and Management
Aim/Purpose: This paper aims to analyze the availability and pricing of perishable farm produce before and during the lockdown restrictions imposed due to Covid-19. This paper also proposes machine learning and deep learning models to help the farmers decide on an appropriate market to sell their farm produce and get a fair price for their product. Background: Developing countries like India have regulated agricultural markets governed by country-specific protective laws like the Essential Commodities Act and the Agricultural Produce Market Committee (APMC) Act. These regulations restrict the sale of agricultural produce to a predefined set of local markets. Covid-19 pandemic led to a lockdown during the first half of 2020 which resulted in supply disruption and demand-supply mismatch of agricultural commodities at these local markets. These demand-supply dynamics led to disruptions in the pricing of the farm produce leading to a lower price realization for farmers. Hence it is essential to analyze the impact of this disruption on the pricing of farm produce at a granular level. Moreover, the farmers need a tool that guides them with the most suitable market/city/town to sell their farm produce to get a fair price. Methodology: One hundred and fifty thousand samples from the agricultural dataset, released by the Government of India, were used to perform statistical analysis and identify the supply disruptions as well as price disruptions of perishable agricultural produce. In addition, more than seventeen thousand samples were used to implement and train machine learning and deep learning models that can predict and guide the farmers about the appropriate market to sell their farm produce. In essence, the paper uses descriptive analytics to analyze the impact of COVID-19 on agricultural produce pricing. The paper explores the usage of prescriptive analytics to recommend an appropriate market to sell agricultural produce. Contribution: Five machine learning models based on Logistic Regression, K-Nearest Neighbors, Support Vector Machine, Random Forest, and Gradient Boosting, and three deep learning models based on Artificial Neural Networks were implemented. The performance of these models was compared using metrics like Precision, Recall, Accuracy, and F1-Score. Findings: Among the five classification models, the Gradient Boosting classifier was the optimal classifier that achieved precision, recall, accuracy, and F1 score of 99%. Out of the three deep learning models, the Adam optimizer-based deep neural network achieved precision, recall, accuracy, and F1 score of 99%. Recommendations for Practitioners: Gradient boosting technique and Adam-based deep learning model should be the preferred choice for analyzing agricultural pricing-related problems. Recommendation for Researchers: Ensemble learning techniques like Random Forest and Gradient boosting perform better than non-Ensemble classification techniques. Hyperparameter tuning is an essential step in developing these models and it improves the performance of the model. Impact on Society: Statistical analysis of the data revealed the true nature of demand and supply and price disruption. This analysis helps to assess the revenue impact borne by the farmers due to Covid-19. The machine learning and deep learning models help the farmers to get a better price for their crops. Though the da-taset used in this paper is related to India, the outcome of this research work applies to many developing countries that have similar regulated markets. Hence farmers from developing countries across the world can benefit from the outcome of this research work. Future Research: The machine learning and deep learning models were implemented and tested for markets in and around Bangalore. The model can be expanded to cover other markets within India.
- Conference Article
5
- 10.2118/218857-ms
- Apr 9, 2024
The design, operation, and optimization of Sucker Rod Pumping (SRP) systems necessitate the utilization of production data. However, forecasting fluid flow rates at the surface of SRP artificially lifted wells usually poses a challenge, especially in instances where traditional separators and multiphase flowmeters are not universally available. Consequently, this study introduces nine machine learning (ML) models employing real data sourced from 598 wells with a production history exceeding three years. The dataset, comprising 8,372 data points, undergoes a random split allocating around 80% of the data (6,697 data points) for training, while around 20% (1,675 data points) are used for testing. The ML models encompass Gradient Boosting (GB), Adaptive Boosting (AdaBoost), Random Forest (RF), Support Vector Machines (SVMs), Decision Tree (DT), K-Nearest Neighbor (KNN), Linear Regression (LR), Artificial Neural Network (ANN), and Stochastic Gradient Descent (SGD). The chosen input features for the models are readily accessible during any SRP well-lifting process, and these inputs include various variables such as wellhead flowing pressure, casing pressure, predicted bottom hole fluid production rate, predicted bottom hole oil production rate, net liquid head above the pump, pump size, pump clearance, stroke length, pump speed, pump setting depth, the temperature at the pump depth, oil gravity, and water viscosity. Evaluation of the different ML models’ performance is carried out by two methodologies: K-fold cross-validation, and repeated random sampling. The findings reveal that the top-performing models are GB, AdaBoost, RF, LR, and SGD, exhibiting mean absolute percentage errors of 3.6%, 3.4%, 3.4%, 4.0%, and 4.4% respectively, and correlation coefficients (R2) of 0.937, 0.934, 0.935, 0.921, and 0.915, respectively. In practical field application, these models are deployed on a well within Egypt's Western Desert fields, demonstrating excellent agreement between actual fluid rates and model predictions. In conclusion, across diverse pumping scenarios and completion configurations, the ML models could effectively forecast production rates for different SRP wells. This capability facilitates continuous monitoring, optimization, and performance analysis of SRP wells, enabling swift responses to operational challenges since the proposed ML models offer an accessible, rapid, and cost-effective alternative to conventional separators and multiphase flowmeters.
- Research Article
- 10.1200/jco.2025.43.5_suppl.647
- Feb 10, 2025
- Journal of Clinical Oncology
647 Background: The primary treatment of most mNSTC is chemotherapy followed by surgery if the residual disease (RD) is >1 cm. However, conventional imaging lacks the specificity to characterize the tissue, often leading to overtreatment. This study hypothesizes that integrating CT-driven radiomics features with plasma miR371 and miR375 will enhance the predictive accuracy of Machine Learning (ML) models to predict teratoma, viable germ cell (vGCT) and fibrosis/necrosis (F/N) in mNSTC patients with RD. Methods: 111 lesions from52 patients, including residual teratoma (n=57), F/N (n=33), vGCT (n=10), and additional seminoma (n=11) for training purposes were included, split into training (N=78) and test cohorts (N=33). Lesions were lymph nodes (n=87), lung (n=21), and brain (n=3) with a median size of 1.6 cm (Q1-Q3 interval=1.2-2.73 cm). 3D Slicer version 5.6.1 was used to segment the RD > 1 cm (short axis) and extract radiomics features. Plasma miRNA levels before resection were measured by RT-PCR. Random Forest (RF), Support Vector Machine (SVM), Gradient Boosting (GB), and CatBoost (CB) ML models were evaluated to define the operating characteristics of radiomics alone (R-only) and in combination with miR371 (371) and/or miR375 (375) levels in predicting teratoma, vGCT and F/N. Results: For predicting teratoma, the best models were RF (R+375 and R+371+375), CB (R+371+375), and GB (R+371 and R+371+375). While adding miR371 or miR375 to R-only slightly improved AUC across models, the best results were achieved with the R+375+371 dataset. CB achieved AUCs ranging from 0.94 to 0.97 in training and 0.81 to 0.93 in test sets, with its highest AUC of 0.93 (95% CI: 0.78-0.97) on the R+375+371 dataset to differentiate all three classes. Similarly, GB demonstrated strong performance, achieving its highest AUC of 0.93 (95% CI: 0.79-0.96) on the R+375+371 dataset (Table). Conclusions: Integration of plasma miR371, miR375 and radiomics improved accuracy of predicting histologies across all ML models. These methods could be used to characterize the histology of RD in mNSTC patients to better inform treatment decisions. Further refinement, including incorporation of histological findings of the primary tumor, will be reported. AUC values of different ML algorithms on training and test sets. TRAINING SET TEST SET Model ±SD R R+375 R+371 R+375+371 Model (95% CI) R R+375 R+371 R+375+371 RF 0.93±0.05 0.95±0.04 0.95±0.03 0.96±0.04 RF 0.8(0.59-0.89) 0.85(0.72-0.93) 0.87(0.76-0.95) 0.91(0.78-0.95) SVM 0.84±0.06 0.84±0.09 0.89±0.11 0.89±0.09 SVM 0.72(0.54-0.80) 0.74(0.56-0.82) 0.83(0.69-0.92) 0.84(0.76-0.94) GB 0.94±0.04 0.91±0.08 0.95±0.05 0.97±0.03 GB 0.84(0.61-0.96) 0.89(0.77-0.97) 0.89(0.79-0.96) 0.93(0.79-0.96) CB 0.95±0.03 0.94±0.03 0.94±0.04 0.97±0.03 CB 0.81(0.6-0.93) 0.86(0.73-0.94) 0.89(0.78-0.97) 0.93(0.78-0.97)
- Research Article
4
- 10.3389/fpubh.2024.1347219
- Apr 25, 2024
- Frontiers in Public Health
Osteoporosis is becoming more common worldwide, imposing a substantial burden on individuals and society. The onset of osteoporosis is subtle, early detection is challenging, and population-wide screening is infeasible. Thus, there is a need to develop a method to identify those at high risk for osteoporosis. This study aimed to develop a machine learning algorithm to effectively identify people with low bone density, using readily available demographic and blood biochemical data. Using NHANES 2017-2020 data, participants over 50 years old with complete femoral neck BMD data were selected. This cohort was randomly divided into training (70%) and test (30%) sets. Lasso regression selected variables for inclusion in six machine learning models built on the training data: logistic regression (LR), support vector machine (SVM), gradient boosting machine (GBM), naive Bayes (NB), artificial neural network (ANN) and random forest (RF). NHANES data from the 2013-2014 cycle was used as an external validation set input into the models to verify their generalizability. Model discrimination was assessed via AUC, accuracy, sensitivity, specificity, precision and F1 score. Calibration curves evaluated goodness-of-fit. Decision curves determined clinical utility. The SHAP framework analyzed variable importance. A total of 3,545 participants were included in the internal validation set of this study, of whom 1870 had normal bone density and 1,675 had low bone density Lasso regression selected 19 variables. In the test set, AUC was 0.785 (LR), 0.780 (SVM), 0.775 (GBM), 0.729 (NB), 0.771 (ANN), and 0.768 (RF). The LR model has the best discrimination and a better calibration curve fit, the best clinical net benefit for the decision curve, and it also reflects good predictive power in the external validation dataset The top variables in the LR model were: age, BMI, gender, creatine phosphokinase, total cholesterol and alkaline phosphatase. The machine learning model demonstrated effective classification of low BMD using blood biomarkers. This could aid clinical decision making for osteoporosis prevention and management.
- Research Article
18
- 10.3390/app13106138
- May 17, 2023
- Applied Sciences
A mortality prediction model can be a great tool to assist physicians in decision making in the intensive care unit (ICU) in order to ensure optimal allocation of ICU resources according to the patient’s health conditions. The entire world witnessed a severe ICU patient capacity crisis a few years ago during the COVID-19 pandemic. Various widely utilized machine learning (ML) models in this research field can provide poor performance due to a lack of proper feature selection. Despite the fact that nature-based algorithms in other sectors perform well for feature selection, no comparative study on the performance of nature-based algorithms in feature selection has been conducted in the ICU mortality prediction field. Therefore, in this research, a comparison of the performance of ML models with and without feature selection was performed. In addition, explainable artificial intelligence (AI) was used to examine the contribution of features to the decision-making process. Explainable AI focuses on establishing transparency and traceability for statistical black-box machine learning techniques. Explainable AI is essential in the medical industry to foster public confidence and trust in machine learning model predictions. Three nature-based algorithms, namely the flower pollination algorithm (FPA), particle swarm algorithm (PSO), and genetic algorithm (GA), were used in this study. For the classification job, the most widely used and diversified classifiers from the literature were used, including logistic regression (LR), decision tree (DT) classifier, the gradient boosting (GB) algorithm, and the random forest (RF) algorithm. The Medical Information Mart for Intensive Care III (MIMIC-III) dataset was used to collect data on heart failure patients. On the MIMIC-III dataset, it was discovered that feature selection significantly improved the performance of the described ML models. Without applying any feature selection process on the MIMIC-III heart failure patient dataset, the accuracy of the four mentioned ML models, namely LR, DT, RF, and GB was 69.9%, 82.5%, 90.6%, and 91.0%, respectively, whereas with feature selection in combination with the FPA, the accuracy increased to 71.6%, 84.8%, 92.8%, and 91.1%, respectively, for the same dataset. Again, the FPA showed the highest area under the receiver operating characteristic (AUROC) value of 83.0% with the RF algorithm among all other algorithms utilized in this study. Thus, it can be concluded that the use of feature selection with FPA has a profound impact on the outcome of ML models. Shapley additive explanation (SHAP) was used in this study to interpret the ML models. SHAP was used in this study because it offers mathematical assurances for the precision and consistency of explanations. It is trustworthy and suitable for both local and global explanations. It was found that the features that were selected by SHAP as most important were also most common with the features selected by the FPA. Therefore, we hope that this study will help physicians to predict ICU mortality for heart failure patients with a limited number of features and with high accuracy.
- Research Article
- 10.4108/ew.7114
- Jul 21, 2025
- EAI Endorsed Transactions on Energy Web
The prediction of wind energy generation is important to enhance the performance and dependability of renewable energy systems due to the rising demand for wind-generated electricity and advancements in wind energy technology competitiveness. This study leverages advanced machine learning (ML) and some other statistical and deep learning based time series forecasting models to enhance the accuracy of wind energy predictions. This comprehensive analysis includes nine ML models—Linear Regression, Random Forests (RF), Gradient Boosting Machines (GBM), Support Vector Machines (SVM), K-Nearest Neighbors (KNN), AdaBoost, XGBoost, Support Vector Regression (SVR), and Neural Networks—as well as Four time-series forecasting models—ARIMA, Temporal Convolutional Networks (TCNs), Long Short-Term Memory (LSTM) networks and GRU. Each ML model underwent rigorous cross-validation to ensure optimal performance. The assessment criteria utilized here comprised the Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the R² Score. It was found that among the nine ML models, Random Forests, GBM and KNN consistently provided superior accuracy and robustness, making them the top choices for wind energy prediction whereas the performance of linear regression, SVM and SVR were very poor for the considered dataset. From the experiment, Random Forest, GBM, and KNN showed the best performance with low MSE values of 0.77, 1.95, and 1.51 respectively, while other models had MSEs above 7.5, with AdaBoost reaching 30. Their RMSEs (0.88, 1.40, 1.23) and MAEs (0.093, 0.73, 0.10) also indicate strong predictive accuracy compared to the rest.In this paper, time series forecasting, TCNs, LSTM and GRU networks showed strong capabilities in capturing temporal dependencies and trends within the wind energy data. Visualization techniques were employed to compare model performances comprehensively, providing clear insights into their predictive power. Therefore, this present study offers a robust framework for researchers and practitioners aiming to leverage machine learning and time series forecasting in the realm of renewable energy prediction.
- Abstract
- 10.1136/annrheumdis-2024-eular.1072
- Jun 1, 2024
- Annals of the Rheumatic Diseases
Background:Systemic lupus erythematosus, characterized by a severe autoimmune disease with strong individual heterogeneity, is particularly important to study the risk factors that affect its prognosis in order to determine patient...
- Research Article
- 10.47164/ijngc.v14i1.1031
- Feb 15, 2023
- International Journal of Next-Generation Computing
Gender Recognition using voice is of enormous prominence in the near future technology as its uses could range from smart assistance robots to customer service sector and many more. Machine learning (ML) models play a vital role in achieving this task. Using the acoustic properties of voice, different ML models classify the gender as male and female. In this research we have used the ML models- Random Forest, Decision Tree, Logistic Regression, Support Vector Machine (SVM), Gradient Boosting, K-Nearest Neighbor (KNN), and ensemble method (KNN, logistic regression, SVM). To propose which algorithm is best for recognizing gender, we have evaluated the models based on results achieved from analysis of accuracy, recall, F1 score, and precision.
- New
- Research Article
- 10.53894/ijirss.v8i11.10819
- Nov 5, 2025
- International Journal of Innovative Research and Scientific Studies
- New
- Research Article
- 10.53894/ijirss.v8i11.10802
- Nov 3, 2025
- International Journal of Innovative Research and Scientific Studies
- New
- Research Article
- 10.53894/ijirss.v8i11.10803
- Nov 3, 2025
- International Journal of Innovative Research and Scientific Studies
- New
- Research Article
- 10.53894/ijirss.v8i11.10792
- Oct 31, 2025
- International Journal of Innovative Research and Scientific Studies
- New
- Research Article
- 10.53894/ijirss.v8i11.10790
- Oct 31, 2025
- International Journal of Innovative Research and Scientific Studies
- New
- Research Article
- 10.53894/ijirss.v8i10.10768
- Oct 29, 2025
- International Journal of Innovative Research and Scientific Studies
- New
- Research Article
- 10.53894/ijirss.v8i10.10754
- Oct 28, 2025
- International Journal of Innovative Research and Scientific Studies
- New
- Research Article
- 10.53894/ijirss.v8i10.10757
- Oct 28, 2025
- International Journal of Innovative Research and Scientific Studies
- New
- Research Article
- 10.53894/ijirss.v8i10.10761
- Oct 28, 2025
- International Journal of Innovative Research and Scientific Studies
- New
- Research Article
- 10.53894/ijirss.v8i10.10752
- Oct 28, 2025
- International Journal of Innovative Research and Scientific Studies
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.