Dealing With Missing, Imbalanced, and Sparse Features During the Development of a Prediction Model for Sudden Death Using Emergency Medicine Data: Machine Learning Approach.

Xiaojie Chen,Huilong Duan,Xiangtian Kong,Shan Nan,Han Chen,Haiyan Zhu

doi:10.2196/38590

Abstract

In emergency departments (EDs), early diagnosis and timely rescue, which are supported by prediction modes using ED data, can increase patients' chances of survival. Unfortunately, ED data usually contain missing, imbalanced, and sparse features, which makes it challenging to build early identification models for diseases. This study aims to propose a systematic approach to deal with the problems of missing, imbalanced, and sparse features for developing sudden-death prediction models using emergency medicine (or ED) data. We proposed a 3-step approach to deal with data quality issues: a random forest (RF) for missing values, k-means for imbalanced data, and principal component analysis (PCA) for sparse features. For continuous and discrete variables, the decision coefficient R2 and the κ coefficient were used to evaluate performance, respectively. The area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) were used to estimate the model's performance. To further evaluate the proposed approach, we carried out a case study using an ED data set obtained from the Hainan Hospital of Chinese PLA General Hospital. A logistic regression (LR) prediction model for patient condition worsening was built. A total of 1085 patients with rescue records and 17,959 patients without rescue records were selected and significantly imbalanced. We extracted 275, 402, and 891 variables from laboratory tests, medications, and diagnosis, respectively. After data preprocessing, the median R2 of the RF continuous variable interpolation was 0.623 (IQR 0.647), and the median of the κ coefficient for discrete variable interpolation was 0.444 (IQR 0.285). The LR model constructed using the initial diagnostic data showed poor performance and variable separation, which was reflected in the abnormally high odds ratio (OR) values of the 2 variables of cardiac arrest and respiratory arrest (201568034532 and 1211118945, respectively) and an abnormal 95% CI. Using processed data, the recall of the model reached 0.746, the F1-score was 0.73, and the AUROC was 0.708. The proposed systematic approach is valid for building a prediction model for emergency patients.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR Medical Informatics	Publication Date: Jan 20, 2023
Citations: 6	License type: cc-by

R Discovery Prime

R Discovery Prime

Dealing With Missing, Imbalanced, and Sparse Features During the Development of a Prediction Model for Sudden Death Using Emergency Medicine Data: Machine Learning Approach.

Abstract

Talk to us

Similar Papers

More From: JMIR Medical Informatics

Lead the way for us

Similar Papers

Pediatric ECG-Based Deep Learning to Predict Left Ventricular Dysfunction and Remodeling.
Akhil Vaid ... William G La Cava
Circulation | VOL. 149
Akhil Vaid, et. al.Akhil Vaid ... William G La Cava
05 Feb 2024
Circulation | VOL. 149

The performance of VCS(volume, conductivity, light scatter) parameters in distinguishing latent tuberculosis and active tuberculosis by using machine learning algorithm
Lijiao Chen ... Shaoli Deng
BMC Infectious Diseases | VOL. 23
Lijiao Chen, et. al.Lijiao Chen ... Shaoli Deng
16 Dec 2023
BMC Infectious Diseases | VOL. 23

Establishment of noninvasive diabetes risk prediction model based on tongue features and machine learning techniques
Jun Li ... Jiatuo Xu
International Journal of Medical Informatics | VOL. 149
Jun Li, et. al.Jun Li ... Jiatuo Xu
22 Feb 2021
International Journal of Medical Informatics | VOL. 149

Predicting Postoperative Mortality With Deep Neural Networks and Natural Language Processing: Model Development and Validation.
Pei-Fu Chen ... Kuan-Chih Chen
JMIR Medical Informatics | VOL. 10
Pei-Fu Chen, et. al.Pei-Fu Chen ... Kuan-Chih Chen
10 May 2022
JMIR Medical Informatics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dealing With Missing, Imbalanced, and Sparse Features During the Development of a Prediction Model for Sudden Death Using Emergency Medicine Data: Machine Learning Approach.

Abstract

Talk to us

Similar Papers

More From: JMIR Medical Informatics