Background Cardiovascular disease is rampant worldwide and has become the leading factor in increasing the global mortality rates. According to the World Heart Federation, death toll due to CVD has increased from 12.1 million in 1990 to around 19 million in 2019. Myocardial Infarction (MI) is a condition where the heart muscle dies due to reduced or inhibited flow of oxygenated blood. It has affected approximately 3 million people worldwide, with more than 1 million deaths in the United States annually. Such unusual proliferation in global death toll due to CVD can be reduced to a great extent by predicting the risk of CVD at an early stage. Method In this paper, several feature selection techniques including Variance-based, Mutual Information (MI), Maximum Relevance Minimum Redundancy (MRMR), Boruta, and Recursive Feature Elimination (RFE) algorithms are used feature optimization. For class prediction, the Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), and Adaboost algorithms were implemented in their ordinary, One-vs-Rest (OVR) and One-vs-One (OVO) methods. Result The performance of Adaboost model has significantly improved by using feature selection techniques, that is, the accuracy of 74% (without any feature selection taking 5.3 seconds) is increased to 85% (with Boruta feature selection taking only 2.17 seconds training time) and 88% (with MRMR feature selection taking 1.6 seconds training time). Similarly, the DT-OVO model’s performance has improved from 84% (without any feature selection taking 1.48 seconds training time) to 86% (with Boruta feature selection taking 0.58 training time). For other models, the performance is maintained with reduced model training times. Conclusion This research paper prioritizes on feature selection in developing machine learning models for CVD prediction. This conclusion is justified by demonstrating the significant reduction in model training times for the 72 models generated while maintaining or even improving the model’s predictive performance.
Read full abstract