Chapter 3 - Big data based hybrid machine learning model for improving performance of medical Internet of Things data in healthcare systems

Mamoon Rashid,Harjeet Singh,Vishal Goyal,Shabir Ahmad Parah,Aabid Rashid Wani

doi:10.1016/b978-0-12-819664-9.00003-x

Abstract

A lot of data is getting generated and captured in Internet of Things (IoT) based devices related to healthcare systems. This data is real time and unstructured in nature. However, this real-time medical based data storage and its processing in IoT applications is still a big challenge. In this chapter, the authors are proposing a new Big Data pipeline solution for storing and processing IoT medical data. The proposed Big Data processing platform uses Apache Flume for efficiently collecting and transferring large amounts of IoT data from Cloud-based server into Hadoop Distributed File System for storage of IoT based sensor medical data. Recursive Feature Elimination with Cross Validation (RFECV) is used for eliminating the features of less importance. Apache Spark is to be used for processing this real-time data. Next the authors propose the use of hybrid prediction model of density-based spatial clustering of applications with noise (DBSCAN) to remove sensor data outliers and provide better accuracy in diabetes disease detection by using Random Forest machine learning classification technique. The authors believe that this Big Data pipeline will greatly help in efficient storage of IoT application medical data and will provide a viable solution for effective processing and predicting disease from medical IoT data.

Full Text