Abstract

One of the main factors that lead to death globally is stroke. The main reason for death by stroke is not taking prevention measures early and not understanding stroke. As a result, death by stroke is thriving all over the world, especially in developing countries like Bangladesh. Steps must be taken to identify strokes as early as possible. In this case, machine learning can be a solution. This study aims to find the appropriate algorithms for machine learning to predict stroke early and accurately and identify the main risk factors for stroke. To perform this work, a real dataset was collected from the Kaggle website and split into two parts: train data and test data, and seven machine learning algorithms such as Random Forest, Decision Tree, K-Nearest Neighbor, Adapting Boosting, Gradient Boosting, Logistic Regression, and Support Vector Machine were applied to that train data. Performance evaluation was calculated based on six performance metrics accuracy, precision, recall, F1-score, ROC curve, and precision-recall curve. To figure out the appropriate algorithm for stroke prediction, the performance for each algorithm was compared, and Random Forest was discovered to be the most effective algorithm with 0.99 accuracy, precision, recall, F1-score, an AUC of 0.9925 for the ROC curve, and an AUC of 0.9874 for the precision-recall curve. Finally, feature importance scores for each algorithm were calculated and ranked in descending order to find out the top risk factors for stroke like ‘age’, ‘average glucose level’, ‘body mass index’, ‘hypertension', and ‘smoking status’. The developed model can be used in different health institutions for stroke prediction with high accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call