Abstract

Heart disease is one of the first causes of death worldwide. This paper presents a real-time system for predicting heart disease from medical data streams that describe a patient’s current health status. The main goal of the proposed system is to find the optimal machine learning algorithm that achieves high accuracy for heart disease prediction. Two types of features selection algorithms, univariate feature selection and Relief, are used to select important features from the dataset. We compared four types of machine learning algorithms; Decision Tree, Support Vector Machine, Random Forest Classifier, and Logistic Regression Classifier with the selected features as well as full features. We apply hyperparameter tuning and cross-validation with machine learning to enhance accuracy. One core merit of the proposed system is able to handle Twitter data streams that contain patients’ data efficiently. This is done by integrating Apache Kafka with Apache Spark as the underlying infrastructure of the system. The results show the random forest classifier outperforms the other models by achieving the highest accuracy at 94.9%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.