Imputation of data Missing Not at Random: Artificial generation and benchmark analysis

Ricardo Cardoso Pereira,Pedro Henriques Abreu,Pedro Pereira Rodrigues,Mário A.T. Figueiredo

doi:10.1016/j.eswa.2024.123654

Abstract

Experimental assessment of different missing data imputation methods often compute error rates between the original values and the estimated ones. This experimental setup relies on complete datasets that are injected with missing values. The injection process is straightforward for the Missing Completely At Random and Missing At Random mechanisms; however, the Missing Not At Random mechanism poses a major challenge, since the available artificial generation strategies are limited. Furthermore, the studies focused on this latter mechanism tend to disregard a comprehensive baseline of state-of-the-art imputation methods. In this work, both challenges are addressed: four new Missing Not At Random generation strategies are introduced and a benchmark study is conducted to compare six imputation methods in an experimental setup that covers 10 datasets and five missingness levels (10% to 80%). The overall findings are that, for most missing rates and datasets, the best imputation method to deal with Missing Not At Random values is the Multiple Imputation by Chained Equations, whereas for higher missingness rates autoencoders show promising results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Expert Systems With Applications	Publication Date: Mar 15, 2024
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Imputation of data Missing Not at Random: Artificial generation and benchmark analysis

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications

Lead the way for us

Similar Papers

Comparison of five imputation methods in handling missing data in a continuous frequency table
M B Mohammed ... N Ali
-
M B Mohammed, et. al.M B Mohammed ... N Ali
01 Jan 2020
01 Jan 2020

Multi-metric comparison of machine learning imputation methods with application to breast cancer survival
Imad El Badisy ... Roch Giorgi
BMC Medical Research Methodology | VOL. 24
Imad El Badisy, et. al.Imad El Badisy ... Roch Giorgi
30 Aug 2024
BMC Medical Research Methodology | VOL. 24

Survival analysis of gastric cancer patients with incomplete data.
Abbas Moghimbeigi ... Ghodaratolla Roshanaei
Journal of gastric cancer | VOL. 14
Abbas Moghimbeigi, et. al.Abbas Moghimbeigi ... Ghodaratolla Roshanaei
01 Jan 2014
Journal of gastric cancer | VOL. 14

Attrition in longitudinal studies: How to deal with missing data
Jos Twisk ... Wieke De Vente
Journal of Clinical Epidemiology | VOL. 55
Jos Twisk, et. al.Jos Twisk ... Wieke De Vente
23 Mar 2002
Journal of Clinical Epidemiology | VOL. 55

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Imputation of data Missing Not at Random: Artificial generation and benchmark analysis

Abstract

Talk to us

Similar Papers

More From: Expert Systems With Applications