Abstract. Heart disease remains a global health threat, making rapid identification crucial. Using the Heart Disease Dataset from Kaggle, this research employs a Random Forest model to analyze 13 clinical variables from 1,025 samples. To improve accuracy and address class imbalance, the dataset was split into training and test sets, utilizing methods such as Z-scores, SMOTE, and feature selection. The Random Forest model, which combines multiple decision trees, achieved high performance with an accuracy of 98.54%, identifying key predictors such as chest pain type, maximum heart rate, and thalassemia. Compared to a single decision tree, the Random Forest model reduces overfitting, improves generalization, and increases predictive accuracy. Factors like cholesterol levels, resting blood pressure, and exercise-induced angina were also considered. By averaging results from multiple trees, the model offers reliable and stable predictions, highlighting its potential in clinical settings for early detection and personalized treatment strategies. This study aims to assist healthcare providers in better allocating resources, planning preventive measures, and tailoring treatment plans to individual patients.
Read full abstract