Abstract

The need for real-time and large-scale data processing has led to the development of frameworks for distributed stream processing in the cloud. To provide fast, scalable, and fault tolerant stream processing, recent Distributed Stream Processing Systems (DSPS) treat streaming workloads as a series of batch jobs, instead of a series of records. Batch-based stream processing systems could process data at high rate but lead to large end-to-end latency. In this paper we concentrate on minimizing the end-to-end latency of batched streaming system by leveraging adaptive batch sizing and execution parallelism tuning. We propose, DyBBS, a heuristic algorithm integrated with isotonic regression to automatically learn and adjust batch size and execution parallelism according to workloads and operating conditions. Our approach does not require workload specific knowledge. The experimental results show that our algorithm significantly reduces end-to-end latency compared to state-of-the-art: i) for Reduce workload, the latency can be reduced by 34.97% and 48.02% for sinusoidal and Markov chain data input rates, respectively, and ii) for Join workload, the latency reductions are 63.28% and 67.51% for sinusoidal and Markov chain data input rates, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.