One of the critical issues in medical data analysis is accurately predicting a patient’s risk of heart disease, which is vital for early intervention and reducing mortality rates. Early detection allows for timely treatment and continuous monitoring by healthcare providers, which is essential but often limited by the inability of medical professionals to provide constant patient supervision. Early detection of cardiac problems and continuous patient monitoring by physicians can help reduce death rates. Doctors cannot constantly have contact with patients, and heart disease detection is not always accurate. By offering a more solid foundation for prediction and decision-making based on data provided by healthcare sectors worldwide, machine learning (ML) could help physicians with the prediction and detection of HD. This study aims to use different feature selection strategies to produce an accurate ML algorithm for early heart disease prediction. We have chosen features using chi-square, ANOVA, and mutual information methods. The three feature groups chosen were SF-1, SF-2, and SF-3. The study employed ten machine learning algorithms to determine the most accurate technique and feature subset fit. The classification algorithms used include support vector machines (SVM), XGBoost, bagging, decision trees (DT), and random forests (RF). We evaluated the proposed heart disease prediction technique using a private dataset, a public dataset, and different cross-validation methods. We used the Synthetic Minority Oversampling Technique (SMOTE) to eliminate inconsistent data and discover the machine learning algorithm that achieves the most accurate heart disease predictions. Healthcare providers might identify early-stage heart disease quickly and cheaply with the proposed method. We have used the most effective ML algorithm to create a mobile app that instantly predicts heart disease based on the input symptoms. The experimental results demonstrated that the XGBoost algorithm performed optimally when applied to the combined datasets and the SF-2 feature subset. It had 97.57% accuracy, 96.61% sensitivity, 90.48% specificity, 95.00% precision, a 92.68% F1 score, and a 98% AUC. We have developed an explainable AI method based on SHAP approaches to understand how the system makes its final predictions.
Read full abstract