Abstract

In the development of predictive models, the problem of missing data is a critical issue that traditionally requires a two-step analysis. Data scientists analyze the patterns of missing values, select variables, impute missing values on the basis of domain knowledge, and then train a model. Models typically have their input sizes hardcoded, and have limitations in handling data with high missing rates or changes in available variables. We propose an attention-based neural network combined with a novel real number representation, which requires little work on manually selecting variables, and in which missing data can be overlooked, making imputation unnecessary. In this proposed model, data analysis can be one step, omitting the first step of imputing missing values. The study included data on 32,709 intensive care unit (ICU) admissions and 60 healthcare variables from the Medical Information Mart for Intensive Care (MIMIC)-IV. The proposed algorithm yielded an area under the receiver operating characteristic curve (AUC) of 0.842 (95% CIs: 0.828–0.856) when predicting prolonged length of stay in the ICU, outperforming current approaches using imputation methods. The proposed algorithm can be applied to a range of problems in data science, as it addresses the issue of incomplete data with automatic variable selection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call