Predicting cardiovascular disease by combining optimal feature selection methods with machine learning

Mauricio Rodriguez Segura,Juan Carrillo Azocar,Billy Peralta Marquez,Orietta Nicolis

doi:10.1109/sccc51225.2020.9281168

Abstract

Cardiovascular Disease (CVD) is one of the main causes of death in the world. Early detection could prevent deaths associated to cardiac problems. In this work, we propose a methodology based on data pre-processing and Machine Learning (ML) techniques for predicting cardiovascular disease, by using the Sleep Heart Health Study (SHHS) dataset. First, the principal component analysis and lowest p-value logistic regression are applied to select optimal features which could be related to the CVD. Then, the selected features are used for training four ML algorithms: Naive Bayes (NB), Feed Forward Neural Networks (NN), Support Vector Machine (SVM) and Random Forest (RF). A binary feature was considered as output of the proposed models and the SMOTE sampling has been used for balancing the training set. Among the proposed methods, NN provided the best accuracy (0.81) and AUC (0.76) outperforming the results obtained in other studies.

Full Text