Abstract
Background: This population-based study investigated the potential of machine learning algorithms to predict stroke incidence and identify important risk factors. This study aimed to evaluate the accuracy of these algorithms in constructing a stroke prediction model. Methods: Participants from the Suita study were included, and baseline measurements were used to predict stroke outcomes over a 15-year follow-up period. In total, 7,389 participants and 51 variables were investigated, including demographics, medical history, medical imaging, laboratory data, and lifestyle habits. Initially, unsupervised K-prototype clustering was used to group participants based on their stroke risk. Subsequently, five supervised models (logistic regression, random forest, support vector machine, extreme gradient boosting, and light gradient boosted machine) were applied to predict the stroke outcomes. The Shapley Additive Explanations (SHAP) method determined the most critical variables. Results: Unsupervised clustering revealed significant differences in stroke incidence among the three identified risk clusters (9.1%, 6.6%, and 3.2%). These clusters were categorized into high-, medium-, and low-risk groups. Among the supervised models, the random forest algorithm demonstrated the best performance. The top ten most important variables for predicting stroke incidence were identified using the SHAP, with age being the most influential variable. Other significant risk markers included systolic blood pressure, hypertension, estimated glomerular filtration rate, metabolic syndrome, and blood sugar level. Additionally, elbow joint thickness and fructosamine, hemoglobin, and calcium levels were found to be potential predictors of stroke risk. Notably, the variables identified by the SHAP were consistent with those obtained from the unsupervised clustering approach in the high-risk group. Conclusion: Machine learning algorithms provide accurate predictions of stroke incidence and offer valuable insights into subclinical markers without the need for prior assumptions of causality. This study presents a data-driven machine-learning framework for stroke risk prediction and biomarker identification.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.