Abstract

On a daily basis, human life is suffering from different types of diseases that's why their life is in distress. Cardiovascular disease is a generic category of disease that is effective in spreading infections and notably, it affects the heart and veins. It is observed that cardiovascular diseases are becoming modest in elderly people besides in young people too. It is very requisite to foreshow this kind of disease in the beginning phases; many types of tests are used for diagnosticating these ailments. This implementation has been done by using a big data tool that is Apache Spark and using spark's MLlib and PySpark libraries which are integrated with it. Apache Spark is among the most widely used big data technologies, and it is a stack of some libraries that are Spark SQL, Spark MLlib, Spark Streaming, etc. This research work aims to build a prediction model to predict whether individuals have cardiovascular disease or not, using machine learning classification techniques which include logistic regression, decision tree, support vector machine, random forest, and gradient-boosting tree classifier and also applied hyperparameter tuning and cross-validation with 5-fold to improve the performance of models. They compared the evaluation of all applied machine learning models and the results observed that the Gradient-Boosting Tree Classifier achieved better Accuracy (73.20%) and Area Under ROC value (0.8002).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call