Abstract Background Cardiovascular disease (CVD) remains a significant cause of mortality and morbidity within the UK. CVDs and all-cause mortality can be predicted with varying levels of certainty with different models. Purpose This study aimed to explore the incremental effect of risk factors and respective biomarkers on CVD events risk estimation models. Methods The National Survey for Health and Development (NSHD) birth cohort study, including 2547 women 2815 men, was used to model relationships between conventional and emerging risk factors and cardiometabolic and vascular outcomes, including myocardial infarction and stroke between 1999 and 2009. Logistic regression or XGBoost (eXtreme Gradient Boosting) models were used to predict the outcomes (myocardial infarction and strokes). Model fit was assessed by comparing predicted and known incident events in a two-by-two error matrix for binary outcomes. Results In a model including both sex and smoking alcohol intake (β = -0.029g; p < 0.05), body mass index (BMI) (β = 0.095 kg/m2; p < 0.05), glycated haemoglobin (HbA1c) (β = 0.033 mmol/mol; p < 0.01) and total cholesterol/high density lipoprotein (TC/HDL) ratio (β = 0.459 mg/dL; p < 0.0001) waist-to-hip ratio (β = 5.632cm; p = 0.05) and waist circumference (β = 0.031cm; p < 0.01) were significant predictors of myocardial infarction. A lower risk was observed among those with higher average alcohol intake, whereas higher risk being observed among those with higher BMI, HbA1c levels, total cholesterol/HDL ratio, waist-to-hip ratio and waist circumference. For stroke, exercising 5+ times (β = -1.47; p < 0.01), systolic blood pressure (SBP) (β = 0.040mmHg; p < 0.001), diastolic blood pressure (DBP) (β = 0.027mmHg; p < 0.001), pulse pressure (β = 0.033mmHg; p < 0.01), triglycerides (β = 0.388; p < 0.05) and total cholesterol/HDL ratio (β = 0.419; p < 0.01) were all significant predictors of stroke. We found that higher values of SBP, DBP, pulse pressure, triglycerides and total cholesterol/HDL ratio were associated with higher risk of stroke, whereas higher levels of physical activity were associated with a lower risk. Additionally, by combining BMI, HbA1c and total cholesterol/HDL ratio to the baseline model it provided the greatest F1 score of all models (0.123) and the greatest precision (6.7%), with a recall of 72.4%. With XGBoost we observed a reduced F1 score from 0.123 to 0.079, although adding information on alcohol intake and waist to hip ratio to this model, as previous analysis indicated great improvements in prediction, the F1 score obtained using the XGBoost increased from to 0.079 to 0.103. Conclusion This research shows that clinical and laboratory measurements play a potentially powerful role in predicting health outcomes when using evidence-based risk calculators. Using large cohorts allows for real-life long-term associations to be explored and quantified with a higher level of certainty for replicability across other populations.
Read full abstract