Abstract

New generation of health care IT systems are collecting and storing more and more data of patients. Useful knowledge can be extracted from the data in EMR or PHR to provide medical advises to patients, while through data analysis the result statistics can be used to support the scientific research. However, RDBMSs-based framework is not able to support the requirements of massive health care data storage, management and analysis. To solve the problem, this paper proposes a massive data management and analysis solution based on Hadoop to archive better performance, scalability and fault tolerance. The data management framework is presented. Besides, 2 different data analysis methods based on MapReduce and Hive are proposed. Experiment results of data upload, data query and data analysis show that the performance of the proposed framework is greatly improved, and a brief summary of the performance and the differences between 2 methods of MapReduce and Hive is also discussed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.