Abstract
Early detection of the severe disease -- stroke is a key step toward effective treatment. Stroke disease data is imbalanced and normally contains the majority of negative cases (without stroke) and the minority of positive cases (stroke). Previous work has used SMOTE to deal with imbalanced data, but most researchers have implemented it for the entire dataset, which means the “answer” was silently “be told” and saved in the entire data, causing data leakage. Moreover, the previous work uses accuracy only as the metrics make the result less guaranteed. We propose a method using the SMOTE applied to the training set only and apply 13 machine learning classifiers for predicting stroke. We combine the AUC with accuracy as evaluation metrics in the stroke prediction task, which can elevate the confidence level of the assessment of results. The experiment shows that misused SMOTE and standardization can cause data leakage and the combined metrics can evaluate models with higher trustworthiness. We conclude that using our method can avoid data leakage and assess the model with higher trustworthiness.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.