Abstract

Distributed stream processing engines (DSPEs) provide stream partitioning methods for distributing messages to tasks deployed in the distributed environment for real-time stream processing. Among these methods, the original locality-aware stream partitioning (LSP) is a binary LSP that sends messages only to downstreams on the same node as upstreams. The binary LSP degrades performance at general configurations because it focuses only on task locality and does not consider downstream status like distributed batch processing engines. In this paper, we propose a Stochastic LSP (SLSP) method that considers not only task locality but also downstream status by computing stream partitioning probability based on the round-trip time to downstreams. We also present coarse-grained and fine-grained methods for probing downstreams at node-level and process-level, respectively. Then, we optimize our SLSP using a weighted closeness to prioritize the partitioning probabilities and a parallel thread model to process each stage of the SLSP in parallel. Finally, we implement the SLSP in Apache Storm, a representative DSPE, and empirically evaluate it with the binary LSP. Experimental results show that our SLSP greatly reduces latency by up to 208% while maintaining a similar throughput compared to the binary LSP at general configurations. These results indicate that our SLSP performs the optimized stream partitioning by reflecting downstream status as well as task locality.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call