Abstract

Traditional risk evaluation models have been applied to guide public health and clinical practice in various studies. However, the application of existing methods to data sets with missing and censored data, as is often the case in electronic health records, requires additional considerations. We aimed to develop and validate a predictive model that exhibits high performance with data sets that contain missing and censored data. This is a retrospective cohort study of coronary heart disease at Weihai Municipal Hospital on unique patients aged 18 to 96 years between 2013 and 2021. A total of 169 692 participants formed our study population, of which 10 895 participants were diagnosed with coronary heart disease. Models were built for the risk of coronary heart disease based on demographic, laboratory, and medical history variables. All complete samples were assigned to the training set (n=110 325), whereas the remaining samples were assigned to the validation set (n=59 367). The area under the receiver operating characteristic curve value was 0.800 (95% CI, 0.794-0.805), and the C statistic was 0.796 (95% CI, 0.791-0.801) in the derivation cohort, and the corresponding values were 0.837 (95% CI, 0.821-0.853) and 0.838 (95% CI, 0.822-0.854) in the validation cohort. The calibration curve demonstrated its good calibration ability, and decision curve analysis showed its clinical usefulness. Our proposed risk prediction model has demonstrated significant effectiveness in handling the complexities of electronic health record data, which often involve extensive missing data and censoring. This approach may offer potential assistance in the use of electronic health records to enhance patient outcomes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call