Abstract

Abstract Background/Introduction Despite the recent increase in the availability of different data sources that can be used for prediction models for cardiovascular disease (CVD), it remains unclear to what extent such data could contribute to improving performance of the models in data-driven cardiovascular research. Purpose To compare the contribution of different data types in basic clinical factors, the European Society of Cardiology Systematic Coronary Risk Evaluation (ESC SCORE), and multidimensional risk factors for CVD prediction performance of artificial neural networks (ANN) using the relevant input features derived from a large-scale medical claims database. Methods We abstracted data through the National Health Insurance Sharing Service and collected information on 258,896 middle-aged individuals free of CVD at baseline (2009–2010) who were followed up for incident CVD until 2013. Multidimensional risk factors identifiable from the database were chosen from a systematic review of published articles. Input features in ANN were classified as follows: basic clinical factors (age, sex, and body mass index), ESC SCORE (age, sex, total cholesterol, systolic blood pressure, and cigarette smoking), and multidimensional risk factors (sociodemographic, lifestyle behavior, underlying medical conditions, dental health, medication use, etc). The data were partitioned into the training and test sets with 7:3 ratio and the performance of each ANN model was evaluated with area under the curve (AUC). Results The ANN model with multidimensional risk factors had higher prediction performance (AUC: 0.692) compared to the models with basic clinical factors (AUC: 0.671) and ESC SCORE (AUC: 0.684). Within the multidimensional risk factors, atrial fibrillation, family history, chronic kidney disease, retinal vein occlusion, dental caries, antipsychotics, and corticosteroid use were some of the strong predictors. However, adding multidimensional risk factors only showed marginal improvement (increase in 1.17% of AUC) compared with the ESC SCORE model. Conclusions Adding multidimensional risk factors as input features in the ANN only showed marginal improvement in the CVD prediction performance. When assessing cardiovascular risk from the large-scale healthcare data, variables included in the ESC SCORE should primarily be considered in the model. Funding Acknowledgement Type of funding source: Public grant(s) – National budget only. Main funding source(s): Kyuwoong Kim received a scholarship from the BK21-plus education program provided by the National Research Foundation of the Republic of Korea. This work is a part of Kyuwoong Kim's PhD dissertation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call