Abstract

Nowadays the world is fighting against a global pandemic Covid-19 that has resulted in more than 5 million deaths and badly impacted world economy. The global spread of COVID-19 has triggered innovative research in the field of distributed computing using Big Data management tools. Big data analytics tools are used to better understand virus spread, to detect and track Covid-19 symptoms, to estimate risk factors, symptoms, diagnostics and other vital information and to control its spread. This paper presents a review of big data solutions that has been adopted to solve research issues in healthcare by performing distributed computing on massive datasets. In the proposed work, Apache Hadoop with MapReduce framework and Spark is used to perform analytics on Covid-19 datasets in parallel and distributive manner. Both frameworks have configuration parameters which can be modified to facilitate job performance and efficiency. This paper compares the performance of two major Bigdata platforms Hadoop and Spark. The execution time and throughput of both frameworks are analyzed with different input data size. The results shows that both platforms can be used to effectively to process huge amount of data in parallel and distributed computing and the performance depends on size of input data and configuration parameters. The results show that Spark has significantly faster computation time than Hadoop for smaller data sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call