Missing data imputation with fuzzy feature selection for diabetes dataset

Mohamad Faiz Dzulkalnine,Roselina Sallehuddin

doi:10.1007/s42452-019-0383-x

Mohamad Faiz Dzulkalnine, Roselina Sallehuddin

Open Access

https://doi.org/10.1007/s42452-019-0383-x

Copy DOI

Abstract

Missing data in datasets remain as a difficulty in terms of data analysis in various research fields, especially in the medical field, as it affects the treatment and diagnosis that the patient should receive. In this research, Fuzzy c-means (FCM) are used to impute the missing data. However, like in most data imputation methods, FCM do not consider the presence of irrelevant features. Irrelevant features can increase the computational time of the imputation process and decrease the accuracy of the prediction. Feature selection techniques can alleviate this problem by selecting the most relevant features and reducing the dataset size. Fuzzy principal component analysis (FPCA) is used as the feature selection method in this study as it considers the presence of outliers compared to classical PCA as outliers are the main reason some features renders irrelevant. Therefore, an improved hybrid imputation model of FPCA–Support vector machines–FCM (FPCA–SVM–FCM) has been proposed and employed in this study. The efficiency of the proposed model is investigated on one dataset which is Pima Indians Diabetes dataset. Experimental results showed that the proposed hybrid imputation model is better than the existing methods by producing a more accurate estimation in terms of accuracy, RMSE and MAE. The proposed method was also validated by using Wilcoxon rank sum and Theil’s U test and obtained good results compared to SVM–FCM. Therefore, it can be used as an alternative tool for handling missing data in order to obtain a better quality dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Missing data imputation with fuzzy feature selection for diabetes dataset

Abstract

Talk to us

Similar Papers

More From: SN Applied Sciences

Lead the way for us

Journal: SN Applied Sciences	Publication Date: Mar 26, 2019
Citations: 37

Similar Papers

Software Implementation of Missing Data Recovery: Comparative Analysis
N V Kovtun ... A.-N Ya Fataliieva
Statistics of Ukraine | VOL. 91
N V Kovtun, et. al.N V Kovtun ... A.-N Ya Fataliieva
16 Dec 2020
Statistics of Ukraine | VOL. 91

Determination of Vital Cancer Sites in Malaysian Colorectal Cancer Dataset by Using A Fuzzy Feature Selection Method
Mohamad Faiz Dzulkalnine ... Roselina Sallehuddin
Journal of Physics: Conference Series | VOL. 2129
Mohamad Faiz Dzulkalnine, et. al.Mohamad Faiz Dzulkalnine ... Roselina Sallehuddin
01 Dec 2021
Journal of Physics: Conference Series | VOL. 2129

Missing Data Imputation with Hybrid Feature Selection for Fertility Dataset
...
ASM Science Journal | VOL. -
, et. al. ...
12 Nov 2020
ASM Science Journal | VOL. -

Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record
Zhen Hu ... Gyorgy J Simon
Journal of Biomedical Informatics | VOL. 68
Zhen Hu, et. al.Zhen Hu ... Gyorgy J Simon
16 Mar 2017
Journal of Biomedical Informatics | VOL. 68

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Missing data imputation with fuzzy feature selection for diabetes dataset

Abstract

Talk to us

Similar Papers

More From: SN Applied Sciences