An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.

Yuzhe Liu,Vanathi Gopalakrishnan

doi:10.3390/data2010008

Abstract

Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models.

Highlights

In biomedical research, samples with missing values are typically discarded to obtain a complete dataset
Missing data mechanisms can be categorized into three types: missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR) [6,7]
We reviewed 12 papers that compared the performance of different imputation methods; they are summarized in Supplementary Table S1, with information on methods and their evaluation, along with types of datasets used and performance results reported

Summary

Introduction

Samples with missing values are typically discarded to obtain a complete dataset. Since the early 2000s, a new paradigm of thinking has emerged where missing values are treated as unknown values to be learned through a machine learning model In this framework, data samples with observed values for a particular variable are used as a training set for a machine learning model, which is applied to the data samples with missing values to impute them. Missing data mechanisms can be categorized into three types: missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR) [6,7] Simple methods such as listwise deletion or mean imputation will only be unbiased when data are MCAR. In the best case scenario, this pattern of missingness can be modeled using prior knowledge in order to bring the data closer to MAR and improve the quality of imputations obtained through methods that assume MAR.

Objectives

Methods

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Data	Publication Date: Jan 25, 2017
Citations: 57	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data

Lead the way for us

Similar Papers

Machine learning in pain research.
Jörn Lötsch ... Alfred Ultsch
Pain | VOL. 159
Jörn Lötsch, et. al.Jörn Lötsch ... Alfred Ultsch
24 Nov 2017
Pain | VOL. 159

Simulation study on missing data imputation methods for longitudinal data in cohort studies
Y M Li ... F Y Chen
Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi | VOL. 42
Y M Li, et. al.Y M Li ... F Y Chen
10 Oct 2021
Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi | VOL. 42

Self-Training With Quantile Errors for Multivariate Missing Data Imputation for Regression Problems in Electronic Medical Records: Algorithm Development Study.
Hansle Gwon ... Yunha Kim
JMIR Public Health and Surveillance | VOL. 7
Hansle Gwon, et. al.Hansle Gwon ... Yunha Kim
13 Oct 2021
JMIR Public Health and Surveillance | VOL. 7

Deep learning based decision tree ensembles for incomplete medical datasets.
Chien-Hung Chiu ... Shih-Wen Ke
Technology and health care : official journal of the European Society for Engineering and Medicine | VOL. 32
Chien-Hung Chiu, et. al.Chien-Hung Chiu ... Shih-Wen Ke
05 Jan 2024
Technology and health care : official journal of the European Society for Engineering and Medicine | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data