Variable importance and prediction methods for longitudinal problems with missing variables.

Iván Díaz,Alan Hubbard,Mitchell Cohen,Anna Decker,Kewei Chen

doi:10.1371/journal.pone.0120031

Abstract

We present prediction and variable importance (VIM) methods for longitudinal data sets containing continuous and binary exposures subject to missingness. We demonstrate the use of these methods for prognosis of medical outcomes of severe trauma patients, a field in which current medical practice involves rules of thumb and scoring methods that only use a few variables and ignore the dynamic and high-dimensional nature of trauma recovery. Well-principled prediction and VIM methods can provide a tool to make care decisions informed by the high-dimensional patient’s physiological and clinical history. Our VIM parameters are analogous to slope coefficients in adjusted regressions, but are not dependent on a specific statistical model, nor require a certain functional form of the prediction regression to be estimated. In addition, they can be causally interpreted under causal and statistical assumptions as the expected outcome under time-specific clinical interventions, related to changes in the mean of the outcome if each individual experiences a specified change in the variable (keeping other variables in the model fixed). Better yet, the targeted MLE used is doubly robust and locally efficient. Because the proposed VIM does not constrain the prediction model fit, we use a very flexible ensemble learner (the SuperLearner), which returns a linear combination of a list of user-given algorithms. Not only is such a prediction algorithm intuitive appealing, it has theoretical justification as being asymptotically equivalent to the oracle selector. The results of the analysis show effects whose size and significance would have been not been found using a parametric approach (such as stepwise regression or LASSO). In addition, the procedure is even more compelling as the predictor on which it is based showed significant improvements in cross-validated fit, for instance area under the curve (AUC) for a receiver-operator curve (ROC). Thus, given that 1) our VIM applies to any model fitting procedure, 2) under assumptions has meaningful clinical (causal) interpretations and 3) has asymptotic (influence-curve) based robust inference, it provides a compelling alternative to existing methods for estimating variable importance in high-dimensional clinical (or other) data.

Highlights

Modern medical care is awash in a sea of data
In this paper we address the problem of estimating variable importance parameters for longitudinal data that are subject to missingness
We present variable importance parameters that have a clear interpretation either as purely statistical parameters or as causal effects, depending on the assumptions about the data generating mechanism that the researcher is willing to make

Summary

Introduction

Modern medical care is awash in a sea of data. The advent of new monitors, better diagnostics, electronic medical record keeping and the ideal of the quantified self has resulted in patients who are more completely measured than at any other time in medical history. Different variables are important and drive future outcome in the first few minutes after injury than at 24 hours when a patient has survived long enough to receive large volume resuscitation, operative intervention and ICU care While these dynamics are intuitive, most practitioners do not have the ability to know which variables are important at any given time point. This results in practitioners who are often left making care decisions without knowledge of the current patient physiologic state and which parameters are important at that moment Left with this uncertainty and awash in constantly evolving multivariate data, practitioners make decisions based on clinical gestalt, a few favorite variables, and rules of thumb developed from clinical experience. This would mimic the implicit understanding a clinician brings to a patient where it is clear that the necessary focus of care must change over time

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Mar 27, 2015
Citations: 32	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Variable importance and prediction methods for longitudinal problems with missing variables.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Useful energy prediction model of a Lithium-ion cell operating on various duty cycles
Damian Burzyński
Ekspolatacja i Niezawodnosc - Maintenance and Reliability | VOL. 24
Damian BurzyńskiDamian Burzyński
19 Apr 2022
Ekspolatacja i Niezawodnosc - Maintenance and Reliability | VOL. 24

Multivariate exponential survival trees and their application to tooth prognosis
Juanjuan Fan ... Xiaogang Su
Computational Statistics & Data Analysis | VOL. 53
Juanjuan Fan, et. al.Juanjuan Fan ... Xiaogang Su
01 Nov 2008
Computational Statistics & Data Analysis | VOL. 53

Random forest and variable importance rankings for correlated survival data, with applications to tooth loss
M.J Hallett ... M.E Nunn
Statistical Modelling | VOL. 14
M.J Hallett, et. al.M.J Hallett ... M.E Nunn
28 Sep 2014
Statistical Modelling | VOL. 14

Analyzing Feature Selection of Chromatographic Fingerprints for Oil Production Allocation
Zongrui Yang ... Qizhi Teng
-
Zongrui Yang, et. al.Zongrui Yang ... Qizhi Teng
01 Jan 2012
01 Jan 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Variable importance and prediction methods for longitudinal problems with missing variables.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one