Abstract

Task allocation in Data Stream Processing Systems (DSPSs) has a significant impact on performance metrics such as data processing latency and system throughput. An application processed by DSPSs can be represented as a Directed Acyclic Graph (DAG), where each vertex represents a task and the edges show the dataflow between the tasks. Task allocation can be defined as the assignment of the vertices in the DAG to the physical compute nodes such that the data movement between the nodes is minimised. Finding an optimal task placement for DSPSs is NP-hard. Thus, approximate scheduling approaches are required to improve the performance of DSPSs. In this paper, we propose a heuristic scheduling algorithm which reliably and efficiently finds highly communicating tasks by exploiting graph partitioning algorithms and a mathematical optimisation software package. We evaluate the communication cost of our method using three micro-benchmarks, showing that we can achieve results that are close to optimal. We further compare our scheduler with two popular existing schedulers, R-Storm and Aniello et al.’s ‘Online scheduler’ using two real-world applications. Our experimental results show that our proposed scheduler outperforms R-Storm, increasing throughput by up to 30%, and improves on the Online scheduler by 20%–86% as a result of finding a more efficient schedule.11This work is an extension of I-Scheduler (Eskandari et al., 2018 [1]).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call