Selective Imputation of Covariates in High Dimensional Censored Data

Caroline Svahn,Oleg Sysoev

doi:10.1080/10618600.2022.2035233

Caroline Svahn, Oleg Sysoev

Open Access

https://doi.org/10.1080/10618600.2022.2035233

Copy DOI

Abstract

Efficient modeling of censored data, that is, data which are restricted by some detection limit or truncation, is important for many applications. Ignoring the censoring can be problematic as valuable information may be missing and restoration of these censored values may significantly improve the quality of models. There are many scenarios where one may encounter censored data: survival data, interval-censored data or data with a lower limit of detection. Strategies to handle censored data are plenty, however, little effort has been made to handle censored data of high dimension. In this article, we present a selective multiple imputation approach for predictive modeling when a larger number of covariates are subject to censoring. Our method allows for iterative, subject-wise selection of covariates to impute in order to achieve a fast and accurate predictive model. The algorithm furthermore selects values for imputation which are likely to provide important information if imputed. In contrast to previously proposed methods, our approach is fully nonparametric and therefore, very flexible. We demonstrate that, in comparison to previous work, our model achieves faster execution and often comparable accuracy in a simulated example as well as predicting signal strength in radio network data. Supplementary materials for this article are available online.

Full Text