Abstract

Distributed stream processing engines (DSPEs) deploy multiple tasks on distributed servers to process data streams in real time. Many DSPEs have provided locality-aware stream partitioning (LSP) methods to reduce network communication costs. However, an even job scheduler provided by DSPEs deploys tasks far away from each other on the distributed servers, which cannot use the LSP properly. In this paper, we propose a Locality/Fairness-aware job scheduler (L/F job scheduler) that considers locality together to solve problems of the even job scheduler that only considers fairness. First, the L/F job scheduler increases cohesion of contiguous tasks that require message transmissions for the locality. At the same time, it reduces coupling of parallel tasks that do not require message transmissions for the fairness. Next, we connect the contiguous tasks into a stream pipeline and evenly deploy stream pipelines to the distributed servers so that the L/F job scheduler achieves high cohesion and low coupling. Finally, we implement the proposed L/F job scheduler in Apache Storm, a representative DSPE, and evaluate it in both synthetic and real-world workloads. Experimental results show that the L/F job scheduler is similar in throughput compared to the even job scheduler, but latency is significantly improved by up to 139.2% for the LSP applications and by up to 140.7% even for the non-LSP applications. The L/F job scheduler also improves latency by 19.58% and 12.13%, respectively, in two real-world workloads. These results indicate that our L/F job scheduler provides superior processing performance for the DSPE applications.

Highlights

  • With the generation of large data streams and the demand for real-time response, research on distributed stream processing engines (DSPEs) are becoming very active

  • Experimental results show that the L/F job scheduler is similar in throughput compared to the even job scheduler, but latency is significantly improved by up to 139.2% for the locality-aware stream partitioning (LSP) applications and by up to 140.7%

  • The L/F job scheduler improves latency by 19.58% and 12.13%, respectively, in two real-world workloads. These results indicate that our L/F job scheduler provides superior processing performance for the DSPE applications

Read more

Summary

Introduction

With the generation of large data streams and the demand for real-time response, research on distributed stream processing engines (DSPEs) are becoming very active. A DSPE application consists of one or more jobs, and a job consists of many tasks. A lot of network communication occurs between tasks while the DSPE applications process the data stream. At this point, the sender tasks (upstreams) must select one (or more) of the many receiver tasks (downstreams) to send the messages. The sender tasks (upstreams) must select one (or more) of the many receiver tasks (downstreams) to send the messages This downstream selection procedure is called a stream partitioning [2,6,10]. In the DSPE, the stream partitioning method is the most important factor affecting the DSPE performance because it determines whether or not network communication occurs

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call