Abstract 15002: Cardiovascular Disease Risk Prediction by Random Survival Forest: The Korean National Health Insurance Service-National Health Screening Cohort

Sooyoung Park,Sarah Ratcliffe,Connie M Ulrich,Kathryn Bowles

doi:10.1161/circ.148.suppl_1.15002

Abstract

Introduction: Risk prediction is critical for cardiovascular disease (CVD) prevention and management. Existing CVD risk prediction models including the Framingham Risk Score (FRS) are based on the Cox proportional hazard ratio regression. Adaptation of a data mining approach could improve the accuracy of CVD risk prediction by being free from the linearity assumption. The aim of study was to construct Korean population-based CVD risk prediction models using random survival forest and to compare their predictive accuracy to the FRS in a 14-year Korean cohort. Methods: This was a secondary data analysis using a Korean population-based cohort. Out of 463,783 eligible participants, 70% were selected for training dataset using stratified random sampling, and the remaining 30% served as test dataset. Random survival forest was adopted to build risk prediction models using the training dataset. The test dataset was used to test the validity of the new models and the FRS. Results: Three random forest-based models were developed, and they were good in discrimination in both men and women with the c-statistics ranging from 0.698 to 0.747 while the c-statistics for the FRS were 0.692 and 0.733 in men and women, respectively. All random forest-based models showed good calibration, whereas the FRS tended to overestimate the CVD risk for men. Conclusions: The random survival forest-based models showed comparable performance with the FRS in a Korean population-based cohort. Continuous efforts are needed to improve the accuracy of CVD risk prediction for the Korean population and to utilize CVD risk prediction in clinical practice.

Full Text