Comparing different feature selection algorithms for cardiovascular disease prediction

Najmul Hasan,Yukun Bao

doi:10.1007/s12553-020-00499-2

Abstract

Determining the key features for the best model fitting in machine learning is not an easy task. The main objective of this study is to accurately predict cardiovascular disease by comparison among different feature selection algorithms. This study has employed a two-stage feature sub-set retrieving technique to achieve this goal: we first considered three well-established feature selection (filter, wrapper, embedded), and then, a feature sub-set was extracted using a Boolean process-based common “True” condition from these three algorithms. To justify the comparative accuracy and define the best predictive analytics, the well-known random forest, support vector classifier, k-nearest neighbors, Naive Bayes, and XGBoost model have been considered. The artificial neural network (ANN) has been considered as the benchmark for further comparison with all features. The experimental outcomes exhibit that the XGBoost Classifier integrated with the wrapper methods offers precise prediction results for cardiovascular disease. The proposed approach can also be applied in other domains such as sports analytics, bio-informatics, and financial analysis in contrast with healthcare informatics. This empirical study’s novelty is that the common “True” condition–based feature selection and comparison technique is entirely a new phenomenon in medical informatics.

Full Text