Abstract
In the development of predictive models, the problem of missing data is a critical issue that traditionally requires a two-step analysis. Data scientists analyze the patterns of missing values, select variables, impute missing values on the basis of domain knowledge, and then train a model. Models typically have their input sizes hardcoded, and have limitations in handling data with high missing rates or changes in available variables. We propose an attention-based neural network combined with a novel real number representation, which requires little work on manually selecting variables, and in which missing data can be overlooked, making imputation unnecessary. In this proposed model, data analysis can be one step, omitting the first step of imputing missing values. The study included data on 32,709 intensive care unit (ICU) admissions and 60 healthcare variables from the Medical Information Mart for Intensive Care (MIMIC)-IV. The proposed algorithm yielded an area under the receiver operating characteristic curve (AUC) of 0.842 (95% CIs: 0.828–0.856) when predicting prolonged length of stay in the ICU, outperforming current approaches using imputation methods. The proposed algorithm can be applied to a range of problems in data science, as it addresses the issue of incomplete data with automatic variable selection.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.