Abstract

In this paper, we suggest a new technique that significantly improve the computational time of the genetic algorithm for imputing missing values. Data contain noise and missing values, which made them unreliable for scientific purposes. Due to this, we are required to preprocess these data before using them. Researchers either avoid or impute missing data. It is necessary to choose an appropriate imputation method, and it is based on several factors such as datatypes and numbers of missing data. For a higher missing value rate, missing value imputation (MVI) can be suitable way for imputing missing data in incomplete dataset. One of the MVI methods is the genetic algorithm; although genetic algorithm may produce good results, the computational time is very high. The proposed algorithm is a combination of the genetic and Asexual Reproduction Optimization (ARO) algorithm. We present an experimental evaluation of Pima and mammographic mass dataset that collected from UCI repository. In the small percentage of missing values, those instances can be imputed by the ARO algorithm, but in the case of large amounts, our approach illustrates much better results. This proposed technique works even better when the rate of missing values is higher. The accuracy and computational time of our proposed algorithm are compared with another techniques like Mean, K-Nearest Neighbor, and SVM. On average our approach 8% improved the accuracy and 4% improved the ROC, and it requires less computational time than a basic genetic algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call