Abstract

Diabetes is a chronic disease that affects millions of people worldwide. Accurate and timely diagnosis of diabetes is crucial for its effective treatment and management. While machine learning has shown promise in predicting the disease, missing data, outliers, class imbalance and limitations of classifiers can hinder accuracy. To address these challenges, we propose a novel machine learning approach that combines adaptive iterative imputation (AII) for missing value imputation, dynamic ensemble isolation forest (DE-IF) for outlier detection and removal, Iterated KMeans SMOTEENN (IKMSENN) for class imbalance, and an adaptive extra tree classifier (AETC) for classification. Our approach is evaluated using the Pima Indian Diabetes Dataset (PIDD), a widely used benchmark dataset in diabetes disease prediction. Experimental results show that our approach outperforms several state-of-the-art machine learning models in terms of accuracy, precision, recall, [Formula: see text]-measure, and the area under the receiver operating characteristic (ROC) curve (AUC-ROC). Our approach achieved an accuracy of 98.58%, with a precision of 0.986, recall of 0.987, [Formula: see text]-measure of 0.985, and ROC of 0.965 on the PIDD dataset. Our research presents a significant contribution to the field of diabetes disease prediction by introducing novel machine learning approaches that address common challenges such as missing data, outliers and class imbalance, as well as limitations of classifiers. Our approach has the potential to greatly improve the accuracy and effectiveness of diabetes disease prediction and has important implications for the diagnosis and management of the disease.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call