Abstract

Cardiovascular disease is a substantial cause of mortality and morbidity in the world. In clinical data analytics, it is a great challenge to predict heart disease survivor. Data mining transforms huge amounts of raw data generated by the health industry into useful information that can help in making informed decisions. Various studies proved that significant features play a key role in improving performance of machine learning models. This study analyzes the heart failure survivors from the dataset of 299 patients admitted in hospital. The aim is to find significant features and effective data mining techniques that can boost the accuracy of cardiovascular patient’s survivor prediction. To predict patient’s survival, this study employs nine classification models: Decision Tree (DT), Adaptive boosting classifier (AdaBoost), Logistic Regression (LR), Stochastic Gradient classifier (SGD), Random Forest (RF), Gradient Boosting classifier (GBM), Extra Tree Classifier (ETC), Gaussian Naive Bayes classifier (G-NB) and Support Vector Machine (SVM). The imbalance class problem is handled by Synthetic Minority Oversampling Technique (SMOTE). Furthermore, machine learning models are trained on the highest ranked features selected by RF. The results are compared with those provided by machine learning algorithms using full set of features. Experimental results demonstrate that ETC outperforms other models and achieves 0.9262 accuracy value with SMOTE in prediction of heart patient’s survival.

Highlights

  • According to WHO, Heart Diseases are a leading cause of death worldwide [1]

  • Authors used the begging C45 ensemble learning approach for cardiovascular disease (CVD) prediction. They have achieved 68.96% accuracy for diagnosis of stenosis in the Right Coronary Artery (RCA), 61.46% accuracy in Left Circumflex (LCX), and 79.54% accuracy in Left Anterior Descending (LAD). Another group of researchers improved the results by applying the Support Vector Machine (SVM) model and achieved 80.50% accuracy for RCA, 86.14% accuracy for LAD and 83.17% accuracy for LCX [32]

  • Results showed that tree-based algorithms outperformed using nine features identified by Random Forest (RF) using Synthetic Minority Oversampling Technique (SMOTE) technique

Read more

Summary

INTRODUCTION

According to WHO, Heart Diseases are a leading cause of death worldwide [1]. It is quite difficult to identify the cardiovascular disease (CVD) because of some contributory factors which contribute to CVD like high blood pressure, cholesterol level, diabetics, abnormal pulse rate, and many other factors [2]. Different classification algorithms are used to predict the CVD in patients and death predictions due to the heart attack [14]. Even though aforementioned researchers showed interesting results by applying standard statistical techniques, such methods are inefficient for large-scale datasets leaving room for other machine learning algorithms. This motivated our attempts to help healthcare professionals by developing machine learning techniques in the diagnosis of CVD patients’ survival. Performance of tree-based, regression-based, and statistical-based models is compared using SMOTE technique in predicting survival of heart patients.

RELATED WORK
ANALYSIS AND DISCUSSION OF RESULTS
EXPERIMENTAL DESIGN
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.