Challenges in Resource Provisioning for the Execution of Data Wrangling Workflows on the Cloud: A Case Study

Abdullah Khalid A Almasaud,Rizos Sakellariou,Agresh Bharadwaj,Sandra Sampaio

doi:10.1007/978-3-030-59051-2_5

Abstract

Data Wrangling (DW) is an essential component of any big data analytics job, encompassing a large variety of complex operations to transform, integrate and clean sets of unrefined data. The inherent complexity and execution cost associated with DW workflows make the provisioning of resources from a cloud provider a sensible solution for executing these workflows in a reasonable amount of time. However, the lack of detailed profiles of the input data and the operations composing these workflows makes the selection of resources to run these workflows on the cloud a hard task due to the large search space to select appropriate resources, their interactions, dependencies, trade-offs and prices that need to be considered. In this paper, we investigate the complex problem of provisioning cloud resources to DW workflows, by carrying out a case study on a specific Traffic DW workflow from the Smart Cities domain. We carry out a number of simulations where we change resource provisioning, focusing on what may impact the execution of the DW workflow most. The insights obtained from our results suggest that fine-grained cloud resource provisioning based on workflow execution profile and input data properties has the potential to improve resource utilization and prevent significant over- and under-provisioning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Challenges in Resource Provisioning for the Execution of Data Wrangling Workflows on the Cloud: A Case Study

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 1	License type: cc-by

Similar Papers

My Facts Are not Your Facts: Data Wrangling as a Socially Negotiated Process, A Case Study in a Multisite Manufacturing Company
Claudia Eckert ... Ola Isaksson
Journal of Computing and Information Science in Engineering | VOL. 22
Claudia Eckert, et. al.Claudia Eckert ... Ola Isaksson
28 Oct 2022
Journal of Computing and Information Science in Engineering | VOL. 22

A Beginner's Guide to Conducting Reproducible Research
Jesse M Alston ... Jessica A Rick
The Bulletin of the Ecological Society of America | VOL. 102
Jesse M Alston, et. al.Jesse M Alston ... Jessica A Rick
15 Jan 2021
The Bulletin of the Ecological Society of America | VOL. 102

Essentials of Data Wrangling
Menal Dahiya ... Nikita Malik
-
Menal Dahiya, et. al.Menal Dahiya ... Nikita Malik
14 Jun 2023
14 Jun 2023

DOT
Yin Huai ... Xiaodong Zhang
-
Yin Huai, et. al.Yin Huai ... Xiaodong Zhang
26 Oct 2011
26 Oct 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Challenges in Resource Provisioning for the Execution of Data Wrangling Workflows on the Cloud: A Case Study

Abstract

Talk to us

Similar Papers