Stroke Prediction with Machine Learning Methods among Older Chinese.

Yafei Wu,Ya Fang

doi:10.3390/ijerph17061828

Yafei Wu, Ya Fang

Open Access

PDF Available

https://doi.org/10.3390/ijerph17061828

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Timely stroke diagnosis and intervention are necessary considering its high prevalence. Previous studies have mainly focused on stroke prediction with balanced data. Thus, this study aimed to develop machine learning models for predicting stroke with imbalanced data in an elderly population in China. Data were obtained from a prospective cohort that included 1131 participants (56 stroke patients and 1075 non-stroke participants) in 2012 and 2014, respectively. Data balancing techniques including random over-sampling (ROS), random under-sampling (RUS), and synthetic minority over-sampling technique (SMOTE) were used to process the imbalanced data in this study. Machine learning methods such as regularized logistic regression (RLR), support vector machine (SVM), and random forest (RF) were used to predict stroke with demographic, lifestyle, and clinical variables. Accuracy, sensitivity, specificity, and areas under the receiver operating characteristic curves (AUCs) were used for performance comparison. The top five variables for stroke prediction were selected for each machine learning method based on the SMOTE-balanced data set. The total prevalence of stroke was high in 2014 (4.95%), with men experiencing much higher prevalence than women (6.76% vs. 3.25%). The three machine learning methods performed poorly in the imbalanced data set with extremely low sensitivity (approximately 0.00) and AUC (approximately 0.50). After using data balancing techniques, the sensitivity and AUC considerably improved with moderate accuracy and specificity, and the maximum values for sensitivity and AUC reached 0.78 (95% CI, 0.73–0.83) for RF and 0.72 (95% CI, 0.71–0.73) for RLR. Using AUCs for RLR, SVM, and RF in the imbalanced data set as references, a significant improvement was observed in the AUCs of all three machine learning methods (p < 0.05) in the balanced data sets. Considering RLR in each data set as a reference, only RF in the imbalanced data set and SVM in the ROS-balanced data set were superior to RLR in terms of AUC. Sex, hypertension, and uric acid were common predictors in all three machine learning methods. Blood glucose level was included in both RLR and RF. Drinking, age and high-sensitivity C-reactive protein level, and low-density lipoprotein cholesterol level were also included in RLR, SVM, and RF, respectively. Our study suggests that machine learning methods with data balancing techniques are effective tools for stroke prediction with imbalanced data.

Highlights

Stroke, accounting for 10% [1] of total deaths and 5% [2] of all disability-adjusted life-years worldwide, has posed a serious threat to population health, especially in developing countries with a low or moderate income [3]
Using areas under the receiver operating characteristic curves (AUCs) for regularized logistic regression (RLR), support vector machine (SVM), and random forest (RF) in the imbalanced data set as references, a significant improvement was observed in the AUCs of all three machine learning methods (p < 0.05) in the balanced data sets
Considering RLR in each data set as a reference, only RF in the imbalanced data set and SVM in the random over-sampling (ROS)-balanced data set were superior to RLR in terms of AUC

Summary

Introduction

Stroke, accounting for 10% [1] of total deaths and 5% [2] of all disability-adjusted life-years worldwide, has posed a serious threat to population health, especially in developing countries with a low or moderate income [3]. With the acceleration of population aging, China has faced the biggest. Res. Public Health 2020, 17, 1828; doi:10.3390/ijerph17061828 www.mdpi.com/journal/ijerph

Objectives

Methods

Results

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Environmental Research and Public Health	Publication Date: Mar 1, 2020
Citations: 79	License type: CC BY 4.0

R Discovery Prime

Stroke Prediction with Machine Learning Methods among Older Chinese.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: International Journal of Environmental Research and Public Health

Lead the way for us

Similar Papers

Soil textural class modeling using digital soil mapping approaches: Effect of resampling strategies on imbalanced dataset predictions
Fereshteh Mirzaei ... Ruth Kerry
Geoderma Regional | VOL. 38
Fereshteh Mirzaei, et. al.Fereshteh Mirzaei ... Ruth Kerry
15 Jun 2024
Geoderma Regional | VOL. 38

Comparing the classification performances of supervised classifiers with balanced and imbalanced SAR data sets
Mustafa Üstüner ... Ünsal Gökdağ
-
Mustafa Üstüner, et. al.Mustafa Üstüner ... Ünsal Gökdağ
01 May 2018
01 May 2018

Social Media User Opinion Analysis Using Deep Learning and Machine Learning Methods: A Case Study on Airlines
Ömer Ayberk Şencan ... İsmail Atacak
Turkish Journal of Mathematics and Computer Science | VOL. 15
Ömer Ayberk Şencan, et. al.Ömer Ayberk Şencan ... İsmail Atacak
31 Dec 2024
Turkish Journal of Mathematics and Computer Science | VOL. 15

Prediction of Smoking Habits From Class-Imbalanced Saliva Microbiome Data Using Data Augmentation and Machine Learning.
Celia Díez López ... Manfred Kayser
Frontiers in Microbiology | VOL. 13
Celia Díez López, et. al.Celia Díez López ... Manfred Kayser
19 Jul 2022
Frontiers in Microbiology | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Stroke Prediction with Machine Learning Methods among Older Chinese.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: International Journal of Environmental Research and Public Health