An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers

Unai Garciarena,Roberto Santana

doi:10.1016/j.eswa.2017.07.026

Abstract

Abstract When applying data-mining techniques to real-world data, we often find ourselves facing observations that have no value recorded for some attributes. This can be caused by several phenomena, such as a machine’s incapability to record certain characteristics or a person refusing to answer a question in a poll. Depending on that motivation, values gone missing may follow one kind of pattern or another, or describe no regularity at all. One approach to palliate the effect of missing data on machine learning tasks is to replace the missing observations. Imputation algorithms attempt to calculate a value for a missing gap, using information associated with it, i.e., the attribute and/or other values in the same observation. While several imputation methods have been proposed in the literature, few works have addressed the question of the relationship between the type of missing data, the choice of the imputation method, and the effectiveness of classification algorithms that used the imputed data. In this paper we address the relationship among these three factors. By constructing a benchmark of hundreds of databases containing different types of missing data, and applying several imputation methods and classification algorithms, we empirically show that an interaction between imputation methods and supervised classification can be deduced. Besides, differences in terms of classification performance for the same imputation method in different missing data patterns have been found. This points to the convenience of considering the combined choice of the imputation method and the classifier algorithm according to the missing data type.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications

Lead the way for us

Journal: Expert Systems with Applications	Publication Date: Jul 17, 2017
Citations: 81

Similar Papers

How to avoid missing data and the problems they pose: design considerations.
... Xin Tu
Shanghai Archives of Psychiatry | VOL. 24
, et. al. ... Xin Tu
01 Jun 2012
Shanghai Archives of Psychiatry | VOL. 24

RESI: A Region-Splitting Imputation method for different types of missing data
Dunlu Peng ... Jing Lu
Expert Systems with Applications | VOL. 168
Dunlu Peng, et. al.Dunlu Peng ... Jing Lu
05 Dec 2020
Expert Systems with Applications | VOL. 168

MISSING DATA IN LONGITUDINAL STUDIES OF AGING: THE GOOD, THE BAD, AND THE UGLY
S Karunananthan ... C Wolfson
Innovation in Aging | VOL. 1
S Karunananthan, et. al.S Karunananthan ... C Wolfson
30 Jun 2017
Innovation in Aging | VOL. 1

Dealing with missing and incomplete data
-
-
--
05 Aug 2022
05 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications