Accurate prediction of cardiovascular disease (CVD) mortality is essential for effective treatment decisions and risk management. Current models often lack comprehensive integration of key biomarkers, limiting their predictive power. This study aims to develop a predictive model for CVD-related mortality using a machine learning-based feature selection algorithm and assess its performance compared to existing models. We analyzed data from a cohort of 4,882 adults recruited between 1999 and 2004, followed for up to 20 years. After applying the Boruta algorithm for feature selection, key biomarkers including NT-proBNP, cardiac troponins, and homocysteine were identified as significant predictors of CVD mortality. Predictive models were built using these biomarkers alongside demographic and clinical variables. Model performance was evaluated using the concordance index (C-index), sensitivity, specificity, and accuracy, with internal validation conducted through bootstrap sampling. Additionally, decision curve analysis (DCA) was performed to assess clinical benefit. The combined model, incorporating both biomarkers and demographic variables, demonstrated superior predictive performance with a C-index of 0.9205 (95% CI: 0.9129–0.9319), outperforming models with demographic variables alone (C-index: 0.9030 (95% CI: 0.8938–0.9147)) or biomarkers alone (C-index: 0.8659 (95% CI: 0.8519–0.8826)). Cox regression analysis further identified key predictors of CVD mortality, including elevated AST/ALT, TyG, BUN, and systolic blood pressure, with protective factors such as higher chloride and iron levels. Nomogram construction and DCA confirmed that the combined model provided substantial net benefit across various time points. The integration of cardiac biomarkers, lipid profiles, and inflammatory markers significantly improves the accuracy of predictive models for CVD-related mortality. This novel approach offers enhanced prognostication, with potential for further optimization through the inclusion of additional clinical and lifestyle data.
Read full abstract