A pre-processing method to deal with missing values by integrating clustering and regression techniques

Shin-Mu Tseng,Kuo-Ho Wang,Chien-I Lee

doi:10.1080/713827170

Abstract

Data pre-processing is a critical task in the knowledge discovery process in order to ensure the quality of the data to be analyzed. One widely studied problem in data pre-processing is the handling of missing values with the aim to recover its original value. Based on numerous studies on missing values, it is shown that different methods are needed for different types of missing data. In this work, we propose a new method to deal with missing values in data sets where cluster properties exist among the data records. By integrating the clustering and regression techniques, the proposed method can predict the missing values with higher accuracy. To our best knowledge, this is the first work combining regression and clustering analysis to deal with the missing values problem. Through empirical evaluation, the proposed method was shown to perform better than other methods under different types of data sets.

Full Text