Abstract

Nowadays, various patients are suffering from heart disease and even die owing to the disease. According to common knowledge, many health problems can cause heart disease directly or indirectly, e.g., overweight, stroke, high blood pressure, and so on. This study uses Heart Disease Health Indicators Dataset from Kaggle to find out significant indicators of heart disease or heart attack, and predicts heart disease by logistic regression, random forest and LightGBM. Based on the analysis, 10 response variables, including health conditions, living habitats and age are significantly relevant to heart disease. In addition, the comparison among the model shows random forest is the most suitable model to predict heart disease with multicollinearity. This paper selects out important factors of heart disease and provides a fitting model for heart disease prediction. Based on the evaluation models, logistic regression and random forest, this paper finds random forest is the fittest model in prediction. Overall, these results shed light on guiding further exploration of indicators of heart disease.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call