Improved identification of individuals at high risk of developing cardiovascular disease would enable targeted interventions and potentially lead to reductions in mortality and morbidity. Our aim was to determine whether use of large-scale proteomics improves prediction of cardiovascular events beyond traditional risk factors (TRFs). Using proximity extension assays, 2919 plasma proteins were measured in 38 380 participants of the UK Biobank. Both data- and literature-based feature selection and trained models using extreme gradient boosting machine learning were used to predict risk of major cardiovascular events (MACEs: fatal and non-fatal myocardial infarction, stroke, and coronary artery revascularization) during a 10-year follow-up. Area under the curve (AUC) and net reclassification index (NRI) were used to evaluate the additive value of selected protein panels to MACE prediction by Systematic COronary Risk Evaluation 2 (SCORE2) or the 10 TRFs used in SCORE2. SCORE2 and SCORE2 refitted to UK Biobank data predicted MACE with AUCs of 0.740 and 0.749, respectively. Data-driven selection identified 114 proteins of greatest relevance for prediction. Prediction of MACE was not improved by using these proteins alone (AUC of 0.758) but was significantly improved by combining these proteins with SCORE2 or the 10 TRFs (AUC = 0.771, P < 001, NRI = 0.140, and AUC = 0.767, P = 0.03, NRI 0.053, respectively). Literature-based protein selection (113 proteins from five previous studies) also improved risk prediction beyond TRFs while a random selection of 114 proteins did not. Large-scale plasma proteomics with data-driven and literature-based protein selection modestly improves prediction of future MACE beyond TRFs.
Read full abstract