RESI: A Region-Splitting Imputation method for different types of missing data

Dunlu Peng,Mengping Zou,Cong Liu,Jing Lu

doi:10.1016/j.eswa.2020.114425

Abstract

A certain degree of data loss seriously affects the accuracy and availability of data, especially on the effects of the subsequent in-depth data analysis and mining. It is of great value in practical applications to construct a data imputation model, which is suitable for completing different types of missing data, including numerical only, categorical only and mixed-type data, and has strong capability of generalization. To address this issue, this paper defines a new metric, mean integrity rate, to measure the missing degree of a dataset, and proposes RESI, a novel tuple-based REgion-Splitting Imputation model, to impute different type missing data. We first select features and assign weights to each attribute by using the entropy weight method, and then partition the tuples into a subset of complete tuples and several subsets of incomplete tuples based on their integrity rate, which is formulated with the weights of attributes and the missing degree of tuples. The model performs training iterations on the complete tuple subset. In each iteration, the trained model is used to impute the next missing subset, and the computed subset is merged into the complete subset for training the next model. To improve the imputation accuracy, we leverage k-fold cross validation to correct errors. Besides imputing diverse types of missing data, extensive experimental results have shown that our model, RESI, significantly outperforms the state-of-the-art methods in the sensitivity to missing rate and accuracy of imputed data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

RESI: A Region-Splitting Imputation method for different types of missing data

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications

Lead the way for us

Journal: Expert Systems with Applications	Publication Date: Dec 5, 2020
Citations: 10

Similar Papers

How to avoid missing data and the problems they pose: design considerations.
... Xin Tu
Shanghai Archives of Psychiatry | VOL. 24
, et. al. ... Xin Tu
01 Jun 2012
Shanghai Archives of Psychiatry | VOL. 24

MISSING DATA IN LONGITUDINAL STUDIES OF AGING: THE GOOD, THE BAD, AND THE UGLY
S Karunananthan ... C Wolfson
Innovation in Aging | VOL. 1
S Karunananthan, et. al.S Karunananthan ... C Wolfson
30 Jun 2017
Innovation in Aging | VOL. 1

Building Cross-National, Longitudinal Data Sets: Issues and Strategies for Implementation
Nicholas E Reith ... Melanie M Hughes
International Journal of Sociology | VOL. 46
Nicholas E Reith, et. al.Nicholas E Reith ... Melanie M Hughes
02 Jan 2015
International Journal of Sociology | VOL. 46

How to deal with missing data? Multiple imputation by chained equations: recommendations and explanations for clinical practice
Bruno Legendre ... Damiano Cerasuolo
Néphrologie & Thérapeutique | VOL. 19
Bruno Legendre, et. al.Bruno Legendre ... Damiano Cerasuolo
01 Jun 2023
Néphrologie & Thérapeutique | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

RESI: A Region-Splitting Imputation method for different types of missing data

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications