SimCost: cost-effective resource provision prediction and recommendation for spark workloads

Yuxing Chen,Mohammad A Hoque,Pengfei Xu,Jiaheng Lu,Sasu Tarkoma

doi:10.1007/s10619-023-07436-y

Yuxing Chen, Mohammad A Hoque + Show 3 more

Open Access

https://doi.org/10.1007/s10619-023-07436-y

Copy DOI

Journal: Distributed and Parallel Databases	Publication Date: Jun 22, 2023
Citations: 1	License type: CC BY 4.0

Affiliation: University of Helsinki

Abstract

Spark is one of the most popular big data analytical platforms. To save time, achieve high resource utilization, and remain cost-effective for Spark jobs, it is challenging but imperative for data scientists to configure suitable resource portions.In this paper, we investigate the proper parameter values that meet workloads’ performance requirements with minimized resource cost and resource utilization time. We propose SimCost, a simulation-based cost model, to predict the performance of jobs accurately. We achieve low-cost training by taking advantage of simulation framework, i.e., Monte Carlo simulation, which uses a small amount of data and resources to make a reliable prediction for larger datasets and clusters. Our method’s salient feature is that it allows us to invest low training costs while obtaining an accurate prediction. Through empirical experiments with 12 benchmark workloads, we show that the cost model yields less than 5% error on average prediction accuracy, and the recommendation achieves up to 6x resource cost saving.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SimCost: cost-effective resource provision prediction and recommendation for spark workloads

Abstract

Talk to us

Similar Papers

More From: Distributed and Parallel Databases

Lead the way for us

Similar Papers

Cost-effective Resource Provisioning for Spark Workloads
Yuxing Chen ... Sasu Tarkoma
-
Yuxing Chen, et. al.Yuxing Chen ... Sasu Tarkoma
03 Nov 2019
03 Nov 2019

VTE-Related Healthcare Resource Utilization and Costs Associated with Venous Thromboembolism in Cancer Patients Treated with Anticoagulants
Michael Streiff ... Alok A Khorana
Blood | VOL. 128
Michael Streiff, et. al.Michael Streiff ... Alok A Khorana
02 Dec 2016
Blood | VOL. 128

A computational tool for automatic selection of total knee replacement implant size using X-ray images.
Thomas A Burge ... Jonathan R.T Jeffers
Frontiers in bioengineering and biotechnology | VOL. 10
Thomas A Burge, et. al.Thomas A Burge ... Jonathan R.T Jeffers
29 Sep 2022
Frontiers in bioengineering and biotechnology | VOL. 10

Thresholding method for dimensionality reduction in recognition systems
N.A Schmid ... J.A O'Sullivan
IEEE Transactions on Information Theory | VOL. 47
N.A Schmid, et. al.N.A Schmid ... J.A O'Sullivan
01 Jan 2001
IEEE Transactions on Information Theory | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SimCost: cost-effective resource provision prediction and recommendation for spark workloads

Abstract

Talk to us

Similar Papers

More From: Distributed and Parallel Databases