Diabetes mellitus prediction: An efficient pipeline of data imputation and oversampling

Neha Rajawat,Rajesh Kumar,Soniya Lalwani,Bharat Singh Hada

doi:10.1142/s1793962323500101

Abstract

Diabetes is a chronic disease which indicates the high level of body glucose level. As per the World Health Organization (WHO), 422 million people were diabetic until 2014. This paper develops an accurate classification machine learning model and an efficient usage of data pre-processing pipeline to improve overall accuracy. For the purpose, six algorithms: Support Vector Machine with Linear kernel (Linear-SVM), Support Vector Machine with RBF kernel (RBF-SVM), K-Nearest Neighbor (KNN), Artificial Neural Network (ANN), Decision Tree and Random Forest are used for classification purpose and their comparative accuracy is analyzed. Data Imputation, Oversampling and Feature scaling techniques are the constituents of Data preprocessing pipeline. Experiments are performed on a well-known dataset of National Institute of Diabetes and Digestive and Kidney Diseases, the PIMA diabetes dataset. The data preprocessing techniques, data imputation and Synthetic Minority Oversample Technique (SMOTE) analysis improved classification accuracy from 77% on raw data, to 88.12% (on Random Forest Classifier) and 91% (on ANN Classifier), respectively. Furthermore, a new feature generation approach is applied and performance is analyzed using the SVM model. Original data attributes BMI and Insulin are replaced with new features BMI_NORMAL and INSULIN_NORMAL, respectively. The significant improvement by proposed technique is confirmed by statistical testing followed by post-hoc analysis.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Diabetes mellitus prediction: An efficient pipeline of data imputation and oversampling

Abstract

Talk to us

Similar Papers

More From: International Journal of Modeling, Simulation, and Scientific Computing

Lead the way for us

Journal: International Journal of Modeling, Simulation, and Scientific Computing	Publication Date: Jun 10, 2022
Citations: 1

Similar Papers

Assessment of machine learning classifiers in mapping the cocoa-forest mosaic landscape of Ghana
George Ashiagbor ... Yaw Asare Mensah
Scientific African | VOL. 20
George Ashiagbor, et. al.George Ashiagbor ... Yaw Asare Mensah
15 May 2023
Scientific African | VOL. 20

Leveraging machine learning tools and algorithms for analysis of fruit fly morphometrics
Daisy Salifu ... Eric Ali Ibrahim
Scientific reports | VOL. 12
Daisy Salifu, et. al.Daisy Salifu ... Eric Ali Ibrahim
03 May 2022
Scientific reports | VOL. 12

Combining deep residual neural network features with supervised machine learning algorithms to classify diverse food image datasets
Patrick Mcallister ... Anne Moorhead
Computers in Biology and Medicine | VOL. 95
Patrick Mcallister, et. al.Patrick Mcallister ... Anne Moorhead
17 Feb 2018
Computers in Biology and Medicine | VOL. 95

Complexity and spectral analysis of the heart rate variability dynamics for distant prediction of paroxysmal atrial fibrillation with artificial intelligence methods
Yuriy V Chesnokov
Artificial Intelligence in Medicine | VOL. 43
Yuriy V ChesnokovYuriy V Chesnokov
01 May 2008
Artificial Intelligence in Medicine | VOL. 43

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Diabetes mellitus prediction: An efficient pipeline of data imputation and oversampling

Abstract

Talk to us

Similar Papers

More From: International Journal of Modeling, Simulation, and Scientific Computing