Abstract

The personalized health care service utilizes the relational patient data and big data analytics to tailor the medication recommendations. However, most of the health care data are in unstructured form and it consumes a lot of time and effort to pull them into relational form. This study proposes a novel data lake architecture to reduce the data ingestion time and improve the precision of healthcare analytics. It also removes the data silos and enhances the analytics by allowing the connectivity to the third-party data providers (such as clinical lab results, chemist, insurance company, etc.). The data lake architecture uses the Hadoop Distributed File System (HDFS) to provide the storage for both structured and unstructured data. This study uses K-means clustering algorithm to find the patient clusters with similar health conditions. Subsequently, it employs a support vector machine to find the most successful healthcare recommendations for the each cluster. Our experiment results demonstrate the ability of data lake to reduce the time for ingesting data from various data vendors regardless of its format. Moreover, it is evident that the data lake poses the potential to generate clusters of patients more precisely than the existing approaches. It is obvious that the data lake provides an unified storage location for the data in its native format. It can also improve the personalized healthcare medication recommendations by removing the data silos.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call