Predicting Mortality in Patients with Stroke Using Data Mining Techniques

Zahra Hadianfard,Toomas Timpka,Hadi Lotfnezhad Afshar,Surena Nazarbaghi,Bahlol Rahimi

doi:10.18267/j.aip.163

Zahra Hadianfard, Toomas Timpka + Show 3 more

Open Access

https://doi.org/10.18267/j.aip.163

Copy DOI

Abstract

The mortality due to stroke is increasing. Accurate prediction of stroke-caused death is very important for healthcare. Data mining methods are novel ways to predict these mortality risks. The aim of this study is to employ popular data mining algorithms to predict the survival of stroke patients and extract decision rules. The data on stroke patients (n=4149) were collected from paper medical records. Missing data were managed using the multiple imputation method. Also, the target variable was balanced using methods such as over-sampling, under-sampling and Synthetic Minority Oversampling (SMOTE). The support vector machine (SVM), decision tree, and logistic regression (LR) algorithms were employed to predict the survival of stroke patients. Also, the Repeated Incremental Pruning to Produce Error Reduction (RIPPER) algorithm was used to extract the decision rules from the main dataset. LR outperformed other algorithms in terms of accuracy (76.96%), sensitivity (79.06%) and kappa (33.34). However, specificity (65.35%) and AUC (0.77) scores were lower than those of other algorithms. An independent dataset with 234 records was selected to challenge the LR algorithm with the best performance from the main dataset. After employing this algorithm on the external validation dataset, its performance was improved in accuracy (79.91%), sensitivity (83.94%), kappa (39.26) and AUC (0.8), but not in specificity (60.98%). The constructed model predicted the survival of stroke patients with high scores and useful rules were extracted for clinical usage.

Highlights

While stroke was the fifth common cause of death worldwide in 1990, it was ranked third in 2017
Considering the rapid increase of stroke incidence in low- and middle-income countries, this study aims to employ the data mining techniques support vector machine (SVM), decision trees, logistic regression (LR), and Repeated Incremental Pruning to Produce Error Reduction (RIPPER) to predict the survival of Iranian stroke patients and to extract corresponding decision rules
The variables used for prediction were age, sex, smoking history, diabetes mellitus (DM) history, hypertension history, congestive heart disease history, coronary artery problem history, atrial fibrillation (AF) history, cerebrovascular accident (CVA) history, hospital-acquired complications (HACs), cholesterol history, and triglycerides history

Summary

Introduction

While stroke was the fifth common cause of death worldwide in 1990, it was ranked third in 2017. The incidence of stroke in Iran has been estimated to span up to 140 cases in every 100,000 people (Azarpazhooh et al, 2010; Hosseini et al, 2010). Almost 60% of all new cases of ischemic stroke occur in people under 70 years (Lindsay et al, 2019). Incidence of this disease is approximately one decade earlier in Iran than in other countries (Ghandehari, 2016). Survival analyses support clinical prognosis by employing historical data to estimate the mortality risk among patients who suffer a specific illness. Accurate predictions of stroke-caused death risk can help clinicians and hospital administrators take necessary hospital management measures (Smith et al, 2013)

Objectives

Methods

Results

Discussion

Conclusion