Due to the aging of the global population and lifestyle changes, cardiovascular disease has become the leading cause of death worldwide, causing serious public health problems and economic pressures. Early and accurate prediction of cardiovascular disease is crucial to reducing morbidity and mortality, but traditional prediction methods often lack robustness. This study focuses on integrating swarm intelligence feature selection algorithms (including whale optimization algorithm, cuckoo search algorithm, flower pollination algorithm, Harris hawk optimization algorithm, particle swarm optimization algorithm, and genetic algorithm) with machine learning technology to improve the early diagnosis of cardiovascular disease. This study systematically evaluated the performance of each feature selection algorithm under different population sizes, specifically by comparing their average running time and objective function values to identify the optimal feature subset. Subsequently, the selected feature subsets were integrated into ten classification models, and a comprehensive weighted evaluation was performed based on the accuracy, precision, recall, F1 score, and AUC value of the model to determine the optimal model configuration. The results showed that random forest, extreme gradient boosting, adaptive boosting and k-nearest neighbor models performed best on the combined dataset (weighted score of 1), where the feature set consisted of 9 key features selected by the cuckoo search algorithm when the population size was 25; while on the Framingham dataset, the k-nearest neighbor model performed best (weighted score of 0.92), and its feature set was derived from 10 features selected by the whale optimization algorithm when the population size was 50. The results of this study show that swarm intelligence algorithms can effectively screen key and informative feature sets, significantly improve model classification accuracy, and provide strong support for the early diagnosis of cardiovascular diseases.
Read full abstract