An Approach for Solving Missing Values in Data Set Using Clustering-Curve Fitting Technique

Kadhim Aljanabi Aljanabi,Nawras Riyadh Neamah,Mansoor Habeebi Habeebi

doi:10.31642/jokmc/2018/0202012

Kadhim Aljanabi Aljanabi, Nawras Riyadh Neamah + Show 1 more

Open Access

https://doi.org/10.31642/jokmc/2018/0202012

Copy DOI

Abstract

Missing values in data sets represent one of the greatest challenge in analyzing data to extract knowledge from the data set. The work in this paper presents a new approach for solving the missing values problems by using and merging two different techniques; clustering (K-means and Expectation Maximization) and curve fitting. More than twenty thousand records of real health data set collected from different Iraqi hospitals were used to create and test the proposed approach that showed better results than the most popular techniques for estimation missing values such as most common values, overall overage, class average, and class most common values. Different software were used in the proposed work including WEKA (Waikato Environment for Knowledge Analysis), Matlab, Excel and C++.

Full Text