Abstract

Predicting the chances of survival due to heart failure is a complex and challenging problem. However, nowadays, advanced data science and machine learning techniques can predict such probabilities by learning data patterns with high accuracy and reliability. This study uses supervised machine learning classification algorithms to predict death post heart failure. Multiple strong learners and an ensemble of weak learners are analyzed in this study. Data imbalance is handled using the class weights method. Logistic Regression, Support Vector Machine, Decision Tree, Extra Trees classifier, Random Forest, XGBoost and CatBoost are used with optimization to infer the probability of death due to heart failure. Statistical tests such as Chi-square, ANOVA and phi-k correlation tests, and Random Forest-based feature selection techniques are used. The vote bank method from all feature selection techniques has been used to select the most medically relevant features that classify with the highest accuracy. After evaluating different classification models, the highest accuracy was obtained using Random Forest, XGBoost and CatBoost. This study reports a classification accuracy of 96.67%, which improves 8% over the previously published work.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call