Abstract

The purpose of the current study was to evaluate the utility of using machine-learning (ML) algorithm and their ability to integrate data inputs to make predictions of future development of cardiovascular disease (CVD) in a nationwide sample of middle-aged Korean adults. Study population was randomly sampled from the Korea National Health Insurance (NHI) database. We identified 143,453 people (79,584 men and 63869 women) who were aged 40 years or older, free from CVD or cancer, and completed health screening tests between 2002 and 2003. CVD was defined as composite of cardiovascular death, myocardial infarction, stroke, and coronary artery disease requiring coronary artery intervention or bypass surgery until 2013, and were identified from the NHI claim database. We employed multiple supervised ML algorithms to build a CVD prediction models using clinical and laboratory data during the follow-up period. Prediction performance of the ML algorithm was compared to those of traditional scoring system or logistic prediction models. ML algorithm [area under curve (AUC) 0.895, 95% confidential interval (CI) 0.889-0.902] outperformed a traditional scoring system (AUC 0.721, 95% CI 0.713-0.730), logistic regression model with baseline characteristics (AUC 0.731, 95% CI 0.723-0.739) and logistic model with time-series analysis (AUC 0.804, 95% CI 0.797-0.811) in prediction of future CVD in men. In women, ML algorithm (AUC 0.908, 95% CI 0.901-0.915) also exhibited the most accurate predictive power, compared to the traditional scoring system (AUC 0.749, 95% CI 0.740-0.758), logistic regression model with baseline characteristics (AUC 0.775, 95% CI 0.767-0.784) and logistic model with time-series analysis (AUC 0.842, 95% CI 0.834-0.849). Our findings suggest that ML algorithm with time-series data can improve the performance of CVD risk prediction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call