Analyzing the Effect of Imputation on Classification Performance under MCAR and MAR Missing Mechanisms

Philip Buczak,Markus Pauly,Jian-Jia Chen

doi:10.3390/e25030521

Abstract

Many datasets in statistical analyses contain missing values. As omitting observations containing missing entries may lead to information loss or greatly reduce the sample size, imputation is usually preferable. However, imputation can also introduce bias and impact the quality and validity of subsequent analysis. Focusing on binary classification problems, we analyzed how missing value imputation under MCAR as well as MAR missingness with different missing patterns affects the predictive performance of subsequent classification. To this end, we compared imputation methods such as several MICE variants, missForest, Hot Deck as well as mean imputation with regard to the classification performance achieved with commonly used classifiers such as Random Forest, Extreme Gradient Boosting, Support Vector Machine and regularized logistic regression. Our simulation results showed that Random Forest based imputation (i.e., MICE Random Forest and missForest) performed particularly well in most scenarios studied. In addition to these two methods, simple mean imputation also proved to be useful, especially when many features (covariates) contained missing values.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy	Publication Date: Mar 17, 2023
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Analyzing the Effect of Imputation on Classification Performance under MCAR and MAR Missing Mechanisms

Abstract

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

Simulation study on missing data imputation methods for longitudinal data in cohort studies
Y M Li ... F Y Chen
Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi | VOL. 42
Y M Li, et. al.Y M Li ... F Y Chen
10 Oct 2021
Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi | VOL. 42

Quantifying the impact of addressing data challenges in prediction of length of stay
Amin Naemi ... Uffe Kock Wiil
BMC Medical Informatics and Decision Making | VOL. 21
Amin Naemi, et. al.Amin Naemi ... Uffe Kock Wiil
30 Oct 2021
BMC Medical Informatics and Decision Making | VOL. 21

A machine learning‐based multiple imputation method for the health and aging brain among latino elders study
Fan Zhang ... Sid E O'Bryant
Alzheimer's & Dementia | VOL. 19
Fan Zhang, et. al.Fan Zhang ... Sid E O'Bryant
01 Jun 2023
Alzheimer's & Dementia | VOL. 19

On mining incomplete medical datasets: Ordering imputation and classification.
Chih-Wen Chen ... Shih-Wen Ke
Technology and health care : official journal of the European Society for Engineering and Medicine | VOL. 23
Chih-Wen Chen, et. al.Chih-Wen Chen ... Shih-Wen Ke
22 Sep 2015
Technology and health care : official journal of the European Society for Engineering and Medicine | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Analyzing the Effect of Imputation on Classification Performance under MCAR and MAR Missing Mechanisms

Abstract

Talk to us

Similar Papers

More From: Entropy