A Comparative Performance Analysis of Hybrid and Classical Machine Learning Method in Predicting Diabetes

  • Abstract
  • PDF
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Diabetes mellitus is one of medical science’s most important research topics because of the disease’s severe consequences. High blood glucose levels characterize it. Early detection of diabetes is made possible by machine learning techniques with their intelligent capabilities to accurately predict diabetes and prevent its complications. Therefore, this study aims to find a machine learning approach that can more accurately predict diabetes. This study compares the performance of various classical machine learning models with the hybrid machine learning approach. The hybrid model includes the homogenous model, which comprises Random Forest, AdaBoost, XGBoost, Extra Trees, Gradient Booster, and the heterogeneous model that uses stacking ensemble methods. The stacking ensemble or stacked generalization approach is a meta-classifier in which multiple learners collaborate for prediction. The performance of the homogeneous hybrid models, Stacked Generalization and the classic machine learning methods such as Naive Bayes and Multilayer Perceptron, k-Nearest Neighbour, and support vector machine are compared. The experimental analysis using Pima Indians and the early-stage diabetes dataset demonstrates that the hybrid models achieve higher accuracy in diagnosing diabetes than the classical models. In the comparison of all the hybrid models, the heterogeneous model using the Stacked Generalization approach outperformed other models by achieving 83.9% and 98.5%. Doi: 10.28991/ESJ-2023-07-01-08 Full Text: PDF

Similar Papers
  • Research Article
  • Cite Count Icon 190
  • 10.1103/physrevlett.126.190505
Information-Theoretic Bounds on Quantum Advantage in Machine Learning.
  • May 14, 2021
  • Physical Review Letters
  • Hsin-Yuan Huang + 2 more

We study the performance of classical and quantum machine learning (ML) models in predicting outcomes of physical experiments. The experiments depend on an input parameter x and involve execution of a (possibly unknown) quantum process E. Our figure of merit is the number of runs of E required to achieve a desired prediction performance. We consider classical ML models that perform a measurement and record the classical outcome after each run of E, and quantum ML models that can access E coherently to acquire quantum data; the classical or quantum data are then used to predict the outcomes of future experiments. We prove that for any input distribution D(x), a classical ML model can provide accurate predictions on average by accessing E a number of times comparable to the optimal quantum ML model. In contrast, for achieving an accurate prediction on all inputs, we prove that the exponential quantum advantage is possible. For example, to predict the expectations of all Pauli observables in an n-qubit system ρ, classical ML models require 2^{Ω(n)} copies of ρ, but we present a quantum ML model using only O(n) copies. Our results clarify where the quantum advantage is possible and highlight the potential for classical ML models to address challenging quantum problems in physics and chemistry.

  • Research Article
  • Cite Count Icon 1
  • 10.1097/md.0000000000038709
Understanding and predicting pregnancy termination in Bangladesh: A comprehensive analysis using a hybrid machine learning approach.
  • Jun 28, 2024
  • Medicine
  • Riaz Rahman + 4 more

Reproductive health issues, including unsafe pregnancy termination, remain a significant concern for women in developing nations. This study focused on investigating and predicting pregnancy termination in Bangladesh by employing a hybrid machine learning approach. The analysis used data from the Bangladesh Demographic and Health Surveys conducted in 2011, 2014, and 2017 to 2018. Ten independent variables, encompassing factors such as age, residence, division, wealth index, working status, BMI, total number of children ever born, recent births, and number of living children, were examined for their potential associations with pregnancy termination. The dataset undergoes preprocessing, addressing missing values and balancing class distributions. To predict pregnancy termination, 8 classical machine learning models and hybrid models were used in this study. The models' performance was evaluated based on the area under the curve, precision, recall, and F1 score. The results highlighted the effectiveness of the hybrid models, particularly the Voting hybrid model (area under the curve: 91.97; precision: 84.14; recall: 83.87; F1 score: 83.84), in accurately predicting pregnancy termination. Notable predictors include age, division, and wealth index. These findings hold significance for policy interventions aiming to reduce pregnancy termination rates, emphasizing the necessity for tailored approaches that consider regional disparities and socioeconomic factors. Overall, the study demonstrates the efficacy of hybrid machine learning models in comprehending and forecasting pregnancy termination, offering valuable insights for reproductive health initiatives in Bangladesh and similar contexts.

  • Research Article
  • Cite Count Icon 18
  • 10.1002/cpe.7190
Hyper‐parametric improved machine learning models for solar radiation forecasting
  • Jul 26, 2022
  • Concurrency and Computation: Practice and Experience
  • Mantosh Kumar + 2 more

SummarySpatiotemporal solar radiation forecasting is extremely challenging due to its dependence on metrological and environmental factors. Chaotic time‐varying and non‐linearity make the forecasting model more complex. To cater this crucial issue, the paper provides a comprehensive investigation of the deep learning framework for the prediction of the two components of solar irradiation, that is, Diffuse Horizontal Irradiance (DHI) and Direct Normal Irradiance (DNI). Through exploratory data analysis the three recent most prominent deep learning (DL) architecture have been developed and compared with the other classical machine learning (ML) models in terms of the statistical performance accuracy. In our study, DL architecture includes convolutional neural network (CNN) and recurrent neural network (RNN) whereas classical ML models include Random Forest (RF), Support Vector Regression (SVR), Multilayer Perceptron (MLP), Extreme Gradient Boosting (XGB), and K‐Nearest Neighbor (KNN). Additionally, three optimization techniques Grid Search (GS), Random Search (RS), and Bayesian Optimization (BO) have been incorporated for tuning the hyper parameters of the classical ML models to obtain the best results. Based on the rigorous comparative analysis it was found that the CNN model has outperformed all classical machine learning and DL models having lowest mean squared error and highest R‐Squared value with least computational time.

  • Research Article
  • Cite Count Icon 44
  • 10.1007/s11269-020-02756-5
Short to Long-Term Forecasting of River Flows by Heuristic Optimization Algorithms Hybridized with ANFIS
  • Feb 11, 2021
  • Water Resources Management
  • Hossien Riahi-Madvar + 3 more

Accurate forecast of short-term to long-term streamflow prediction is of great importance for water resources management. However, with the advent of novel hybrid machine learning methods, it remains unclear whether these hybrid models can outperform the traditional streamflow forecast models. Therefore, in this study, we trained and tested the performance of several evolutionary algorithms, including Fire-Fly Algorithm(FFA), Genetic Algorithm (GA), Grey Wolf Optimization (GWO), Particle Swarm Optimization (PSO), and Differential Evolution (DE) hybridized with ANFIS. Three forecast horizons, short-term (Daily), mid-term (Weekly and Monthly) and long-term (Annual) with fifteen input-output combinations, a total of 90 models, were developed and tested. A Monte Carlo Simulation (MCS) framework is used for uncertainty analysis. Daily inflow to the Karun III dam, located in the southeast of Iran, for the period of June 2005 to December 2016 were used. Results indicated that: 1) All developed hybrid algorithms significantly outperformed the traditional ANFIS model performance for all prediction horizons. The best hybrid models were ANFIS-GWO1, ANFIS-GWO7 and ANFIS-GWO11 such that the values of R2, RMSE, NSE, and RAE were improved by 12%, 10%, 18.5% and 14.3% for the short-term forecasts, 15%, 13%, 20% and 21.1% for the mid-term forecasts, and 10.3%, 7.5%, 10.5% and 14% for the long-term forecasts; 2) Uncertainty analysis indicates that nearly all hybrid models have significantly reduced uncertainty levels compared to the traditional ANFIS model; and 3) A simple explicit equation based on the hybrid ANFIS results was provided for streamflow forecasting, which is a major advantage compared to the classical blackbox machine learning models.

  • Research Article
  • Cite Count Icon 1
  • 10.29244/ijsa.v5i2p284-303
Forecasting Currency in East Java: Classical Time Series vs. Machine Learning
  • Jun 30, 2021
  • Indonesian Journal of Statistics and Its Applications
  • J A Putri + 5 more

Most research about the inflow and outflow currency in Indonesia showed that these data contained both linear and nonlinear patterns with calendar variation effect. The goal of this research is to propose a hybrid model by combining ARIMAX and Deep Neural Network (DNN), known as hybrid ARIMAX-DNN, for improving the forecast accuracy in the currency prediction in East Java, Indonesia. ARIMAX is class of classical time series models that could accurately handle linear pattern and calendar variation effect. Whereas, DNN is known as a machine learning method that powerful to tackle a nonlinear pattern. Data about 32 denominations of inflow and outflow currency in East Java are used as case studies. The best model was selected based on the smallest value of RMSE and sMAPE at the testing dataset. The results showed that the hybrid ARIMAX-DNN model improved the forecast accuracy and outperformed the individual models, both ARIMAX and DNN, at 26 denominations of inflow and outflow currency. Hence, it can be concluded that hybrid classical time series and machine learning methods tend to yield more accurate forecasts than individual models, both classical time series and machine learning methods.

  • Conference Article
  • 10.1145/3573942.3574105
Full-Rotation Quantum Convolutional Neural Network for Abnormal Intrusion Detection System
  • Sep 23, 2022
  • Suya Chao + 4 more

Intrusion detection system (IDS) is a significant mechanism to improve network security. As a promising technique, machine learning (ML) methods has been applied in IDS to obtain high classification accuracy. However, classical ML based IDS methods hit a bottleneck in computing performance in case of huge network traffic and complex high-dimensional data. Due to the parallelism, superposition, entanglement of quantum computing, quantum computing provides a new solution to speed up the classical ML algorithms. This paper proposes a novel IDS scheme based on full-rotation quantum convolutional neural network (FR-QCNN). The key component of the FR-QCNN is the quantum convolution filter, which is composed of coding layer, variational layer and measurement layer. Different from the traditional quantum convolutional neural network, a full-rotation quantum circuit is used in the variational layer of the FR-QCNN, realizing a complete parameter update in the model training. Experiment on dataset from KDD Cup shows that the IDS classification accuracy of FR-QCNN is higher than classical ML models such as convolutional neural network (CNN), decision tree (DT) and support vector machine (SVM), as well as higher than traditional quantum convolutional neural network(QCNN). Meanwhile, FR-QCNN and QCNN have lower space complexity and time complexity than classical ML methods.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 21
  • 10.3390/en14237970
Hybrid Machine Learning for Solar Radiation Prediction in Reduced Feature Spaces
  • Nov 29, 2021
  • Energies
  • Abdel-Rahman Hedar + 3 more

Solar radiation prediction is an important process in ensuring optimal exploitation of solar energy power. Numerous models have been applied to this problem, such as numerical weather prediction models and artificial intelligence models. However, well-designed hybridization approaches that combine numerical models with artificial intelligence models to yield a more powerful model can provide a significant improvement in prediction accuracy. In this paper, novel hybrid machine learning approaches that exploit auxiliary numerical data are proposed. The proposed hybrid methods invoke different machine learning paradigms, including feature selection, classification, and regression. Additionally, numerical weather prediction (NWP) models are used in the proposed hybrid models. Feature selection is used for feature space dimension reduction to reduce the large number of recorded parameters that affect estimation and prediction processes. The rough set theory is applied for attribute reduction and the dependency degree is used as a fitness function. The effect of the attribute reduction process is investigated using thirty different classification and prediction models in addition to the proposed hybrid model. Then, different machine learning models are constructed based on classification and regression techniques to predict solar radiation. Moreover, other hybrid prediction models are formulated to use the output of the numerical model of Weather Research and Forecasting (WRF) as learning elements in order to improve the prediction accuracy. The proposed methodologies are evaluated using a data set that is collected from different regions in Saudi Arabia. The feature-reduction has achieved higher classification rates up to 8.5% for the best classifiers and up to 15% for other classifiers, for the different data collection regions. Additionally, in the regression, it achieved improvements of average root mean square error up to 5.6% and in mean absolute error values up to 8.3%. The hybrid models could reduce the root mean square errors by 70.2% and 4.3% than the numerical and machine learning models, respectively, when these models are applied to some dataset. For some reduced feature data, the hybrid models could reduce the root mean square errors by 47.3% and 14.4% than the numerical and machine learning models, respectively.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.rineng.2024.102320
Forecasting particle Froude number in non-deposition scenarios within sewer pipes through hybrid machine learning approaches
  • May 25, 2024
  • Results in Engineering
  • Sanjit Kumar + 3 more

Forecasting particle Froude number in non-deposition scenarios within sewer pipes through hybrid machine learning approaches

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 8
  • 10.3390/rs15215165
Transformer in UAV Image-Based Weed Mapping
  • Oct 29, 2023
  • Remote Sensing
  • Jiangsan Zhao + 2 more

Weeds affect crop yield and quality due to competition for resources. In order to reduce the risk of yield losses due to weeds, herbicides or non-chemical measures are applied. Weeds, especially creeping perennial species, are generally distributed in patches within arable fields. Hence, instead of applying control measures uniformly, precision weeding or site-specific weed management (SSWM) is highly recommended. Unmanned aerial vehicle (UAV) imaging is known for wide area coverage and flexible operation frequency, making it a potential solution to generate weed maps at a reasonable cost. Efficient weed mapping algorithms need to be developed together with UAV imagery to facilitate SSWM. Different machine learning (ML) approaches have been developed for image-based weed mapping, either classical ML models or the more up-to-date deep learning (DL) models taking full advantage of parallel computation on a GPU (graphics processing unit). Attention-based transformer DL models, which have seen a recent boom, are expected to overtake classical convolutional neural network (CNN) DL models. This inspired us to develop a transformer DL model for segmenting weeds, cereal crops, and ‘other’ in low-resolution RGB UAV imagery (about 33 mm ground sampling distance, g.s.d.) captured after the cereal crop had turned yellow. Images were acquired during three years in 15 fields with three cereal species (Triticum aestivum, Hordeum vulgare, and Avena sativa) and various weed flora dominated by creeping perennials (mainly Cirsium arvense and Elymus repens). The performance of our transformer model, 1Dtransformer, was evaluated through comparison with a classical DL model, 1DCNN, and two classical ML methods, i.e., random forest (RF) and k-nearest neighbor (KNN). The transformer model showed the best performance with an overall accuracy of 98.694% on pixels set aside for validation. It also agreed best and relatively well with ground reference data on total weed coverage, R2 = 0.598. In this study, we showed the outstanding performance and robustness of a 1Dtransformer model for weed mapping based on UAV imagery for the first time. The model can be used to obtain weed maps in cereals fields known to be infested by perennial weeds. These maps can be used as basis for the generation of prescription maps for SSWM, either pre-harvest, post-harvest, or in the next crop, by applying herbicides or non-chemical measures.

  • Research Article
  • Cite Count Icon 8
  • 10.1016/j.apgeochem.2023.105731
A chemistry-informed hybrid machine learning approach to predict metal adsorption onto mineral surfaces
  • Jun 27, 2023
  • Applied Geochemistry
  • Elliot Chang + 3 more

A chemistry-informed hybrid machine learning approach to predict metal adsorption onto mineral surfaces

  • Supplementary Content
  • Cite Count Icon 23
  • 10.2196/35293
Comparison of Severity of Illness Scores and Artificial Intelligence Models That Are Predictive of Intensive Care Unit Mortality: Meta-analysis and Review of the Literature
  • May 31, 2022
  • JMIR Medical Informatics
  • Cristina Barboi + 2 more

BackgroundSeverity of illness scores—Acute Physiology and Chronic Health Evaluation, Simplified Acute Physiology Score, and Sequential Organ Failure Assessment—are current risk stratification and mortality prediction tools used in intensive care units (ICUs) worldwide. Developers of artificial intelligence or machine learning (ML) models predictive of ICU mortality use the severity of illness scores as a reference point when reporting the performance of these computational constructs.ObjectiveThis study aimed to perform a literature review and meta-analysis of articles that compared binary classification ML models with the severity of illness scores that predict ICU mortality and determine which models have superior performance. This review intends to provide actionable guidance to clinicians on the performance and validity of ML models in supporting clinical decision-making compared with the severity of illness score models.MethodsBetween December 15 and 18, 2020, we conducted a systematic search of PubMed, Scopus, Embase, and IEEE databases and reviewed studies published between 2000 and 2020 that compared the performance of binary ML models predictive of ICU mortality with the performance of severity of illness score models on the same data sets. We assessed the studies' characteristics, synthesized the results, meta-analyzed the discriminative performance of the ML and severity of illness score models, and performed tests of heterogeneity within and among studies.ResultsWe screened 461 abstracts, of which we assessed the full text of 66 (14.3%) articles. We included in the review 20 (4.3%) studies that developed 47 ML models based on 7 types of algorithms and compared them with 3 types of the severity of illness score models. Of the 20 studies, 4 (20%) were found to have a low risk of bias and applicability in model development, 7 (35%) performed external validation, 9 (45%) reported on calibration, 12 (60%) reported on classification measures, and 4 (20%) addressed explainability. The discriminative performance of the ML-based models, which was reported as AUROC, ranged between 0.728 and 0.99 and between 0.58 and 0.86 for the severity of illness score–based models. We noted substantial heterogeneity among the reported models and considerable variation among the AUROC estimates for both ML and severity of illness score model types.ConclusionsML-based models can accurately predict ICU mortality as an alternative to traditional scoring models. Although the range of performance of the ML models is superior to that of the severity of illness score models, the results cannot be generalized due to the high degree of heterogeneity. When presented with the option of choosing between severity of illness score or ML models for decision support, clinicians should select models that have been externally validated, tested in the practice environment, and updated to the patient population and practice environment.Trial RegistrationPROSPERO CRD42021203871; https://tinyurl.com/28v2nch8

  • Research Article
  • Cite Count Icon 18
  • 10.1016/j.retram.2021.103319
Clinical prognosis evaluation of COVID-19 patients: An interpretable hybrid machine learning approach
  • Oct 30, 2021
  • Current Research in Translational Medicine
  • Ozan Kocadagli + 4 more

Clinical prognosis evaluation of COVID-19 patients: An interpretable hybrid machine learning approach

  • Research Article
  • Cite Count Icon 56
  • 10.1016/j.chemosphere.2019.125450
Flocculation-dewatering prediction of fine mineral tailings using a hybrid machine learning approach.
  • Nov 25, 2019
  • Chemosphere
  • Chongchong Qi + 5 more

Flocculation-dewatering prediction of fine mineral tailings using a hybrid machine learning approach.

  • Research Article
  • Cite Count Icon 4
  • 10.1186/s12905-025-03669-4
Development and evaluation of a machine learning model for osteoporosis risk prediction in Korean women
  • Mar 28, 2025
  • BMC Women's Health
  • Minkyung Je + 3 more

BackgroundThe aim of this study was to develop a machine learning (ML) model for classifying osteoporosis in Korean women based on a large-scale population cohort study. This study also aimed to assess ML model performance compared with traditional osteoporosis screening tools. Furthermore, this study aimed to examine the factors influencing the risk of osteoporosis through variable importance.MethodsData was collected from 4199 women aged 40–69 years in the baseline survey of the Ansan and Ansung cohort of the Korean Genome and Epidemiology Study. Osteoporosis was set as the dependent variable to develop ML classification models. Independent variables included 122 factors related to osteoporosis risk, such as socio-demographic characteristics, anthropometric parameters, lifestyle factors, reproductive factors, nutrient intakes, diet quality indices, medical history, medication history, family history, biochemical parameters, and genetic factors. The six classification models were developed using ML techniques, including decision tree, random forest, multilayer perceptron, support vector machine, light gradient boosting machine, and extreme gradient boosting (XGBoost). The six ML classification models were compared with two traditional osteoporosis screening tools, including the osteoporosis risk assessment instrument (ORAI) and the osteoporosis self-assessment tool (OST). The ML model performances were evaluated and compared using the confusion matrix and area under the curve (AUC) metrics. Variable importance was assessed using the XGBoost technique to investigate osteoporosis risk factors.ResultsThe XGBoost model showed the highest performance out of the six ML classification models, with an accuracy of 0.705, precision of 0.664, recall of 0.830, and F1 score of 0.738. Moreover, the XGBoost model showed a higher performance on AUC than ORAI and OST. Variable importance scores were identified for 69 out of the 122 variables associated with osteoporosis risk factors. Age at menopause ranked first in variable importance. Variables of arthritis, physical activities, hypertension, education level, income level; alcohol intake, potassium intake, homeostatic model assessment for insulin resistance; energy intake, vitamin C intake, gout; and dietary inflammatory index ranked in the top 20 out of the 69 variables, using the XGBoost technique.ConclusionsThis study found that an XGBoost model can be utilized to classify osteoporosis in Korean women. Age at menopause is a significant factor in osteoporosis risk, followed by arthritis, physical activities, hypertension, and education level.

  • Research Article
  • 10.18799/24131830/2025/6/5061
Development of a decision support system for assessing the technical condition of power transformers
  • Jun 30, 2025
  • Bulletin of the Tomsk Polytechnic University Geo Assets Engineering
  • Vladislav A Shelomentsev + 5 more

Relevance. Reliable and environmentally safe operation of power transformers is an essential requirement for the functioning of modern power systems. Transformer oil degradation and abnormal operating conditions of electrical equipment are key factors leading to emergency situations. The composition of transformer oil serves as an indicator of the technical condition of a transformer and enables assessment of the lifespan of its insulating materials and internal components. Timely replacement of oil contributes to extending the operational lifetime of power transformers, reducing the risk of sudden failures, and enhancing overall reliability of the power system. Forecasting the technical condition of a power transformer, integrating various parameters – such as dissolved gas concentrations and electrical characteristics of the oil – is accepted as a crucial indicator for identifying early signs of wear and potential malfunctions, and allows prediction of the transformer operational lifespan. One of the approaches to addressing challenges in determining the technical condition of power transformers involves the application of artificial intelligence methods. In this context, the development of model-based decision-making systems that integrate predictions from classical machine learning algorithms and models generated using automated machine learning techniques is highly relevant. Such systems combine the advantages of expert-driven algorithm selection with the capabilities of automated searches for optimal model architectures and hyperparameters. This hybrid approach enhances the accuracy of assessing a power transformer technical condition and, consequently, improves the determination of its expected service life based on the evaluation. Aim. To improve the reliability of power transformers while minimizing maintenance costs through the application of artificial intelligence methods. Methods. Statistical analysis of chromatographic data of transformer oil; data preprocessing (elimination of anomalous and duplicate records, z‑transformation); classical machine learning methods (linear regression, Random Forest, Extra Trees, Hist Gradient Boosting), model validation using an 8:2 data split; development of a model structure based on AutoML with the specialized FEDOT software platform; calculation and analysis of model performance metrics (R², MAE, MSE, RMSE); ensemble methods Averaging, Weighted Averaging, Stacking, Blending and XGBoost. Results. An ensemble model was developed for the comprehensive assessment of the technical condition of power transformers based on transformer oil chromatography analysis and operational data, using machine learning methods. This approach eliminates labor-intensive calculations of the effect of individual parameters and reduces human factor impact during expert evaluations. Implementation of the proposed model allows objective estimation of the remaining lifespan of power transformers and justifies the transition to risk-oriented maintenance, thereby reducing operational costs and minimizing the risk of electrical equipment failure.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon
Setting-up Chat
Loading Interface