Improved Algorithm for Finding the Minimum Cost of Storing and Regenerating Datasets in Multiple Clouds

Yingying Wang,Kun Cheng,Zimao Li

doi:10.1007/978-3-319-94776-1_35

Abstract

This paper studies intermediate datasets storage problem with linear dataflow in multiple clouds. The proliferation of cloud computing allows users to flexibly store, re-compute or transfer large generated datasets with multiple cloud service providers. However, due to the pay-as-you-go model, the total cost of using cloud services depends on the consumption of storage, computation and bandwidth resources. Given cloud service providers with different pricing models on their resources, users can flexibly choose a cloud service to store a generated dataset, or delete it and then regenerate it when needed, or transfer it to another cloud service in order to reduce the total cost for datasets storage and re-computation. The current best algorithm for finding an optimal strategy of a linear dataflow in multiple clouds takes \(O\left( m^4n^3\right) \), where m is the number of the clouds and n is the number of datasets in a dataflow. In this paper, we present an improved algorithm for the linear dataflow with time complexity \(O\left( m^3n^3\right) \).

Full Text