Abstract

In order to run a dataflow with as low cost as possible, it is often faced with deciding which data-sets in a data-set sequence should be stored, with the rest regenerated. The Intermediate Data-set Storage problem arises from this situation. The current best algorithm for this problem takes O(n4) time. In this paper, we present two improved algorithms for this problem, the first of which can achieve a time complexity O(n2), the second of which O(rn), where n is the number of data-sets in a dataflow, r is a numerical number which indicates how large it is for the maximum storage cost to be divided by the minimum computation cost in the dataflow.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call