Stroke Prediction Model Based on XGBoost Algorithm

Wenwen He,Pengcheng Du,Hongli Le

doi:10.37394/232029.2022.1.2

Abstract

In this paper, individual sample data randomly measured are preprocessed, for example, outliers values are deleted and the characteristics of the samples are normalized to between 0 and 1. The correlation analysis approach is then used to determine and rank the relevance of stroke characteristics, and factors with poor correlation are discarded. The samples are randomly split into a 70% training set and a 30% testing set. Finally,the random forest model and XGBoost algorithm combined with cross-validation and grid search method are implemented to learn the stroke characteristics. The accuracy of the testing set by the XGBoost algorithm is 0.9257, which is better than that of the random forest model with 0.8991. Thus, the XGBoost model is selected to predict the stroke for ten people, and the obtained conclusion is that two people have a stroke and eight people have no stroke.

Full Text