Abstract
A number of technologies enabled by Internet of Thing (IoT) have been used for the prevention of various chronic diseases, continuous and real-time tracking system is a particularly important one. Wearable medical devices with sensor, health cloud and mobile applications have continuously generating a huge amount of data which is often called as streaming big data. Due to the higher speed of the data generation, it is difficult to collect, process and analyze such massive data in real-time in order to perform real-time actions in case of emergencies and extracting hidden value. using traditional methods which are limited and time-consuming. Therefore, there is a significant need to real-time big data stream processing to ensure an effective and scalable solution. In order to overcome this issue, this work proposes a new architecture for real-time health status prediction and analytics system using big data technologies. The system focus on applying distributed machine learning model on streaming health data events ingested to Spark streaming through Kafka topics. Firstly, we transform the standard decision tree (DT) (C4.5) algorithm into a parallel, distributed, scalable and fast DT using Spark instead of Hadoop MapReduce which becomes limited for real-time computing. Secondly, this model is applied to streaming data coming from distributed sources of various diseases to predict health status. Based on several input attributes, the system predicts health status, send an alert message to care providers and store the details in a distributed database to perform health data analytics and stream reporting. We measure the performance of Spark DT against traditional machine learning tools including Weka. Finally, performance evaluation parameters such as throughput and execution time are calculated to show the effectiveness of the proposed architecture. The experimental results show that the proposed system is able to effectively process and predict real-time and massive amount of medical data enabled by IoT from distributed and various diseases.
Highlights
Over the past two decades our era can be described as big data era where digital data is becoming increasingly important in many domains like healthcare, science, technology and society
The rest of this paper is illustrated over a few sections: In “Background”, we present a brief introduction to big data challengers in healthcare with related work followed by detailed description of the proposed system in “Methods” section
Performnace evaluation of machine learning model The two datasets have been randomly split into a training data set and a test data set, 70% of the data is used to train the model, and 30% will be used for testing
Summary
Over the past two decades our era can be described as big data era where digital data is becoming increasingly important in many domains like healthcare, science, technology and society. Last challenge is related to big data analytics, more precisely to mining massive datasets in real-time or near real-time that include modeling, visualization, prediction, and optimization [2] These challenges require new processing paradigm as the current data management systems are not efficient in dealing with heterogeneous nature of data or the real-time. Healthcare data comes from distributed sources such as, electronic medical records, clinical images, diagnosis data and health claim data, streaming system, sensors attached to the patient’s bedside to continually track patient vitals. They produce huge chunks of data where the traditional data processing system are inadequate to deal with them effectively [8].
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have