Abstract

Current stream processing systems (SPSs) suffer from the imbalanced load and limited parallelism due to skewed data distributions and imbalanced computational resources. We observed that the cause of these problems is current SPSs partition their workloads statically. To address this problem, we design a distributed stream processing system, Marabunta, for skewed stream processing. Marabunta performs dynamic scaling and load balancing automatically at runtime. Large partitions in a skewed data distribution can be processed in parallel or migrated to idle machines to achieve load balancing. Moreover, Marabunta uses a new execution model to accelerate the execution by increases the parallelism and the computational resources utilization. We implemented Marabunta in C++ and optimized it for modern hardware. Our evaluations on typical streaming workloads show that Marabunta achieves higher throughputs and better elasticity with both uniform and skewed datasets compared to the state-of-the-art SPSs, e.g., Flink and Heron.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.