Do We Really Need Imputation in AutoML Predictive Modeling?

George Paterakis,Ioannis Tsamardinos,Vassilis Christophides,Stefanos Fafalios,Paulos Charonyktakis

doi:10.1145/3643643

George Paterakis, Ioannis Tsamardinos + Show 3 more

Open Access

https://doi.org/10.1145/3643643

Copy DOI

Abstract

Numerous real-world data contain missing values, while in contrast, most Machine Learning (ML) algorithms assume complete datasets. For this reason, several imputation algorithms have been proposed to predict and fill in the missing values. Given the advances in predictive modeling algorithms tuned in an Automated Machine Learning context (AutoML) setting, a question that naturally arises is to what extent sophisticated imputation algorithms (e.g., Neural Network based) are really needed, or we can obtain a descent performance using simple methods like Mean/Mode (MM). In this article, we experimentally compare six state-of-the-art representatives of different imputation algorithmic families from an AutoML predictive modeling perspective, including a feature selection step and combined algorithm and hyper-parameter selection. We used a commercial AutoML tool for our experiments, in which we included the selected imputation methods. Experiments ran on 25 binary classification real-world incomplete datasets with missing values and 10 binary classification complete datasets in which synthetic missing values are introduced according to different missingness mechanisms, at varying missing frequencies. The main conclusion drawn from our experiments is that the best method on average is the Denoise AutoEncoder on real-world datasets and the MissForest in simulated datasets, followed closely by MM. In addition, binary indicator variables encoding missingness patterns actually improve predictive performance, on average. Last, although there are cases where Neural-Network-based imputation significantly improves predictive performance, this comes at a great computational cost and requires measuring all feature values to impute new samples.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Do We Really Need Imputation in AutoML Predictive Modeling?

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Knowledge Discovery from Data

Lead the way for us

Journal: ACM Transactions on Knowledge Discovery from Data	Publication Date: Apr 12, 2024
License type: cc-by

Similar Papers

Handling missing values and imbalanced classes in machine learning to predict consumer preference: Demonstrations and comparisons to prominent methods
Yahui Liu ... Zhen Li
Expert Systems with Applications | VOL. 237
Yahui Liu, et. al.Yahui Liu ... Zhen Li
20 Sep 2023
Expert Systems with Applications | VOL. 237

Propensity score analysis with missing data using a multi-task neural network
Shu Yang ... Peipei Du
BMC Medical Research Methodology | VOL. 23
Shu Yang, et. al.Shu Yang ... Peipei Du
15 Feb 2023
BMC Medical Research Methodology | VOL. 23

Mixed Data Imputation Using Generative Adversarial Networks
Wasif Khan ... Mohammad Mehedy Masud
IEEE Access | VOL. 10
Wasif Khan, et. al.Wasif Khan ... Mohammad Mehedy Masud
01 Jan 2021
IEEE Access | VOL. 10

The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study
Manja Deforth ... Ulrike Held
Journal of Clinical Epidemiology | VOL. -
Manja Deforth, et. al.Manja Deforth ... Ulrike Held
01 Sep 2024
Journal of Clinical Epidemiology | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Do We Really Need Imputation in AutoML Predictive Modeling?

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Knowledge Discovery from Data