To store or not: Online cost optimization for running big data jobs on the cloud

Xiankun Fu,Li Pan,Shijun Liu

doi:10.1016/j.future.2024.03.003

Abstract

As businesses increasingly rely on cloud-based big data analytics services to drive insights, reducing the cost of storing and analyzing large volumes of data in the cloud has become a major concern. During the execution of big data analysis jobs, some of the generated data can be reused by subsequent jobs. By storing such intermediate data, the cost of running big data jobs can be greatly reduced for businesses using cloud services. An important challenge is how to determine which data should be stored in order to save costs. Existing storing strategies do not differentiate between data with different usage frequencies, resulting in significant storage costs in practical applications. To address the aforementioned challenges, in this paper we propose two online algorithms, one deterministic and the other randomized, which dynamically determine whether to store the data with the aim of saving cost. We show that our proposed deterministic algorithm (resp., randomized) incurs costs within a factor of 2−α′ (resp., 21+α′) times the minimum cost obtained by an optimal offline algorithm which is assumed to know the exact future a priori. Finally, through extensive experiments with real-world workload of big data jobs in Alibaba Cloud environment, we demonstrate that our proposed online algorithms can achieve significant cost savings under common cloud pricing schemes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

To store or not: Online cost optimization for running big data jobs on the cloud

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems

Lead the way for us

Similar Papers

Profit Maximization of Big Data Jobs in Cloud Using Stochastic Optimization
Seyed Morteza Nabavinejad ... Maziar Goudarzi
IEEE Transactions on Cloud Computing | VOL. 9
Seyed Morteza Nabavinejad, et. al.Seyed Morteza Nabavinejad ... Maziar Goudarzi
12 Jul 2019
IEEE Transactions on Cloud Computing | VOL. 9

Analysis of big data job requirements based on K-means text clustering in China.
Dai Debao ... Zhao Min
PloS one | VOL. 16
Dai Debao, et. al.Dai Debao ... Zhao Min
05 Aug 2021
PloS one | VOL. 16

Analysis of big data job requirements based on K-means text clustering in China
Dai Debao ... Bing Xue
-
Dai Debao, et. al.Dai Debao ... Bing Xue
05 Aug 2021
05 Aug 2021

How Big Data Creates New Job Opportunities: Skill Profiles of Emerging Professional Roles
Sara Bonesso ... Elena Bruni
-
Sara Bonesso, et. al.Sara Bonesso ... Elena Bruni
19 Dec 2019
19 Dec 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

To store or not: Online cost optimization for running big data jobs on the cloud

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems