Nonclinical Features in Predictive Modeling of Cardiovascular Diseases: A Machine Learning Approach.

Mirza Rizwan Sajid,A Suresh,Seifedine Kadry,Syed Ahmad Chan Bukhari,Noryanti Muhammad,Ahmad Shahbaz,Roslinazairimah Zakaria

doi:10.1007/s12539-021-00423-w

Abstract

In the broader healthcare domain, the prediction bears more value than an explanation considering the cost of delays in its services. There are various risk prediction models for cardiovascular diseases (CVDs) in the literature for early risk assessment. However, the substantial increase in CVDs-related mortality is challenging global health systems, especially in developing countries. This situation allows researchers to improve CVDs prediction models using new features and risk computing methods. This study aims to assess nonclinical features that can be easily available in any healthcare systems, in predicting CVDs using advanced and flexible machine learning (ML) algorithms. A gender-matched case-control study was conducted in the largest public sector cardiac hospital of Pakistan, and the data of 460 subjects were collected. The dataset comprised of eight nonclinical features. Four supervised ML algorithms were used to train and test the models to predict the CVDs status by considering traditional logistic regression (LR) as the baseline model. The models were validated through the train-test split (70:30) and tenfold cross-validation approaches. Random forest (RF), a nonlinear ML algorithm, performed better than other ML algorithms and LR. The area under the curve (AUC) of RF was 0.851 and 0.853 in the train-test split and tenfold cross-validation approach, respectively. The nonclinical features yielded an admissible accuracy (minimum 71%) through the LR and ML models, exhibiting its predictive capability in risk estimation. The satisfactory performance of nonclinical features reveals that these features and flexible computational methodologies can reinforce the existing risk prediction models for better healthcare services.

Full Text