A Multiple Imputation Approach for Handling Missing Data in Classification and Regression Trees

Danielle M Rodgers,Kevin J Grimm,Ross Jacobucci

doi:10.35566/jbds/v1n1/p6

Danielle M Rodgers, Kevin J Grimm + Show 1 more

Open Access

https://doi.org/10.35566/jbds/v1n1/p6

Copy DOI

Abstract

Decision trees (DTs) is a machine learning technique that searches the predictor space for the variable and observed value that leads to the best prediction when the data are split into two nodes based on the variable and splitting value. The algorithm repeats its search within each partition of the data until a stopping rule ends the search. Missing data can be problematic in DTs because of an inability to place an observation with a missing value into a node based on the chosen splitting variable. Moreover, missing data can alter the selection process because of its inability to place observations with missing values. Simple missing data approaches (e.g., listwise deletion, majority rule, and surrogate split) have been implemented in DT algorithms; however, more sophisticated missing data techniques have not been thoroughly examined. We propose a modified multiple imputation approach to handling missing data in DTs, and compare this approach with simple missing data approaches as well as single imputation and a multiple imputation with prediction averaging via Monte Carlo Simulation. This study evaluated the performance of each missing data approach when data were MAR or MCAR. The proposed multiple imputation approach and surrogate splits had superior performance with the proposed multiple imputation approach performing best in the more severe missing data conditions. We conclude with recommendations for handling missing data in DTs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Multiple Imputation Approach for Handling Missing Data in Classification and Regression Trees

Abstract

Talk to us

Similar Papers

More From: Journal of Behavioral Data Science

Lead the way for us

Journal: Journal of Behavioral Data Science	Publication Date: May 1, 2021
Citations: 3

Similar Papers

Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study
Andrea Marshall ... Roger L Holder
BMC Medical Research Methodology | VOL. 10
Andrea Marshall, et. al.Andrea Marshall ... Roger L Holder
19 Jan 2010
BMC Medical Research Methodology | VOL. 10

What is missing from my missing data plan?
Sharon D Yeatts ... Renée H Martin
Stroke | VOL. 46
Sharon D Yeatts, et. al.Sharon D Yeatts ... Renée H Martin
07 May 2015
Stroke | VOL. 46

Application of open-access and 3rd party geospatial technology for integrated flood risk management in data sparse regions of developing countries

-

01 Jan 2018
01 Jan 2018

Contrasting case-wise deletion with multiple imputation and latent variable approaches to dealing with missing observations in count regression models
Amir Pooyan Afghari ... Md Mazharul Haque
Analytic Methods in Accident Research | VOL. 24
Amir Pooyan Afghari, et. al.Amir Pooyan Afghari ... Md Mazharul Haque
17 Aug 2019
Analytic Methods in Accident Research | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Multiple Imputation Approach for Handling Missing Data in Classification and Regression Trees

Abstract

Talk to us

Similar Papers

More From: Journal of Behavioral Data Science