Evaluation of distributed data processing frameworks in hybrid clouds

Faheem Ullah,Shagun Dhingra,Xiaoyu Xia,M Ali Babar

doi:10.1016/j.jnca.2024.103837

Abstract

Distributed data processing frameworks (e.g., Hadoop, Spark, and Flink) are widely used to distribute data among computing nodes of a cloud. Recently, there have been increasing efforts aimed at evaluating the performance of distributed data processing frameworks hosted in private and public clouds. However, there is a paucity of research on evaluating the performance of these frameworks hosted in a hybrid cloud, which is an emerging cloud model that integrates private and public clouds to use the best of both worlds. Therefore, in this paper, we evaluate the performance of Hadoop, Spark, and Flink in a hybrid cloud in terms of execution time, resource utilization, horizontal scalability, vertical scalability, and cost. For this study, our hybrid cloud consists of OpenStack (private cloud) and MS Azure (public cloud). We use both batch and iterative workloads for the evaluation. Our results show that in a hybrid cloud (i) the execution time increases as more nodes are borrowed by the private cloud from the public cloud, (ii) Flink outperforms Spark, which in turn outperforms Hadoop in terms of execution time, (iii) Hadoop transfers the largest amount of data among the nodes during the workload execution while Spark transfers the least amount of data, (iv) all three frameworks horizontally scale better as compared to vertical scaling, and (v) Spark is found to be least expensive in terms of $ cost for data processing while Hadoop is found the most expensive.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluation of distributed data processing frameworks in hybrid clouds

Abstract

Talk to us

Similar Papers

More From: Journal of Network and Computer Applications

Lead the way for us

Similar Papers

Performance of Hadoop Application on Hybrid Cloud
Hayata Ohnaga ... Kento Aida
-
Hayata Ohnaga, et. al.Hayata Ohnaga ... Kento Aida
01 Oct 2015
01 Oct 2015

A Framework for Fast MapReduce Processing Considering Sensitive Data on Hybrid Clouds
Shun Kawamoto ... Yoko Kamidoi
-
Shun Kawamoto, et. al.Shun Kawamoto ... Yoko Kamidoi
01 Jul 2020
01 Jul 2020

Towards the Design of a System and a Workflow Model for Medical Big Data Processing in the Hybrid Cloud
Yong-Hyun Kim ... Eui-Nam Huh
-
Yong-Hyun Kim, et. al.Yong-Hyun Kim ... Eui-Nam Huh
01 Nov 2017
01 Nov 2017

Job Scheduling in Hybrid Clouds With Privacy Constraints: A Deep Reinforcement Learning Approach
Haoyang He ... Long Cheng
Concurrency and Computation: Practice and Experience | VOL. -
Haoyang He, et. al.Haoyang He ... Long Cheng
15 Oct 2024
Concurrency and Computation: Practice and Experience | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluation of distributed data processing frameworks in hybrid clouds

Abstract

Talk to us

Similar Papers

More From: Journal of Network and Computer Applications