Abstract

Distributed stream big data analytics platforms have emerged to tackle the continuously generated data streams. In stream big data analytics, the data processing workflow is abstracted as a directed graph referred to as a topology. Data are read from the storage and processed tuple by tuple, and these processing results are updated dynamically. The performance of a topology is evaluated by its throughput. This paper proposes an efficient resource allocation scheme for a heterogeneous stream big data analytics cluster shared by multiple topologies, in order to achieve max-min fairness in the utilities of the throughput for all the topologies. We first formulate a novel resource allocation problem, which is a mixed 0-1 integer program. The NP-hardness of the problem is rigorously proven. To tackle this problem, we transform the non-convex constraint to several linear constraints using linearization and reformulation techniques. Based on the analysis of the problem-specific structure and characteristics, we propose an approach that iteratively solves the continuous problem with a fixed set of discrete variables optimally, and updates the discrete variables heuristically. Simulations show that our proposed resource allocation scheme remarkably improves the max-min fairness in utilities of the topology throughput, and is low in computational complexity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call