Abstract

ObjectiveTo compare Cox models, machine learning (ML), and ensemble models combining both approaches, for prediction of stroke risk in a prospective study of Chinese adults.Materials and MethodsWe evaluated models for stroke risk at varying intervals of follow-up (<9 years, 0–3 years, 3–6 years, 6–9 years) in 503 842 adults without prior history of stroke recruited from 10 areas in China in 2004–2008. Inputs included sociodemographic factors, diet, medical history, physical activity, and physical measurements. We compared discrimination and calibration of Cox regression, logistic regression, support vector machines, random survival forests, gradient boosted trees (GBT), and multilayer perceptrons, benchmarking performance against the 2017 Framingham Stroke Risk Profile. We then developed an ensemble approach to identify individuals at high risk of stroke (>10% predicted 9-yr stroke risk) by selectively applying either a GBT or Cox model based on individual-level characteristics.ResultsFor 9-yr stroke risk prediction, GBT provided the best discrimination (AUROC: 0.833 in men, 0.836 in women) and calibration, with consistent results in each interval of follow-up. The ensemble approach yielded incrementally higher accuracy (men: 76%, women: 80%), specificity (men: 76%, women: 81%), and positive predictive value (men: 26%, women: 24%) compared to any of the single-model approaches.Discussion and ConclusionAmong several approaches, an ensemble model combining both GBT and Cox models achieved the best performance for identifying individuals at high risk of stroke in a contemporary study of Chinese adults. The results highlight the potential value of expanding the use of ML in clinical practice.

Highlights

  • Stroke is a leading cause of death and disability worldwide, with about three-quarters of all stroke cases occurring in low- and middle-income countries (LMICs).[1]

  • We developed an ensemble approach to identify individuals at high risk of stroke (>10% predicted 9-yr stroke risk) by selectively applying either a gradient boosted trees (GBT) or Cox model based on individual-level characteristics

  • Consistent with findings for cardiovascular risk prediction,[20] we demonstrated that machine learning (ML) techniques improved 9-yr risk prediction of stroke over Cox models, with GBT providing the best discrimination and calibration performance

Read more

Summary

Introduction

Stroke is a leading cause of death and disability worldwide, with about three-quarters of all stroke cases occurring in low- and middle-income countries (LMICs).[1]. Used risk scores include the Pooled Cohort Equations[8] and QRISK9–11 for CVD, as well as the Framingham Stroke Risk Profile[12,13] for stroke. Such risk scores are typically derived using Cox proportional hazards models and have been validated mainly in high-income countries (HICs).[14,15,16] the clinical utility of such models for risk prediction of stroke in contemporary populations of LMICs such as China is uncertain, and novel risk scores should be developed for use in such populations.[17,18,19]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.