Abstract

The bulk of medical databases contain coverage gaps due in large part to the expensive expense of some tests or human error in documenting these tests. Due to the absence of values for some features, the performance of the machine learning models is significantly impacted. Consequently, a specific category of techniques is necessary for the aim of imputing missing data. In this study, the Grey Wolf Algorithm (GWA) is used to generate and impute the missing values in the Pima Indian Diabetes Disease (PIDD) dataset. The proposed method is known as the Pima Indian Diabetes Disease (PIDD) Algorithm (IGW). The obtained results demonstrated that the classification performance of three distinct classifiers, namely the Support Vector Machine (SVM), the K-Nearest Neighbor (KNN), and the Naive Bayesian Classifier (NBC), was enhanced in comparison to the dataset prior to the application of the proposed method. In addition, the results indicated that IGW performed better than statistical imputation procedures such as removing samples with missing values, replacing them with zeros, mean, or random values.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call