Cooperative Clustering Missing Data Imputation

Daoming Wan,Mehrdad Saif,Roozbeh Razavi-Far

doi:10.1109/smc42975.2020.9283484

Abstract

Missing data imputation is a critical part of data cleaning tasks and vital for learning from incomplete data. This paper proposes a novel cooperative clustering imputation (CCI) method to estimate missing values. The proposed method aims to find a better clustering model and donor for imputation, comparing with individual clustering algorithms. It makes use of agreements among different clustering algorithms to generate a set of sub-clusters, and, then, merges these sub-clusters based on the matrix of the performance measures of sub-clusters. The proposed method is evaluated using ten public datasets from UCI data repository and V2X communication data with induced missing samples, and compared with three standard clustering based imputation methods, k-means imputation, fuzzy c-means imputation, and partition around medoids imputation. Missing values are induced through each dataset by different missing mechanisms, missing rates, and missing distribution, and, thus, various incomplete datasets are generated. The performance of these methods are checked using normalized root mean square error (NRMSE). The attained experimental results indicate the effectiveness of the proposed missing values imputation method.

Full Text