Global Shuffle Grouping (GSG): A Load Balancing Strategy for Continuous Range Queries on Storm

Yuqi Zhang,Botao Wang,Xiao Tian,Hanhui Zhong,Jianpeng Zhou

doi:10.1109/sera.2018.8477194

Abstract

Apache Storm is a distributed stream processing framework to support real-time processing of big data. Even if many stream grouping strategies have been implemented in Storm to partition stream data in order to maximize usability of resources, but they cannot efficiently support continuous range query. It is the basis of location based services, in which both queries and objects are moving. The reason is that the spatial semantics of the query (range and data distribution) cannot be expressed by those strategies, and this is easy to result in load imbalance. For this problem, we propose a load-balancing strategy called global shuffle grouping (GSG) to support efficient continuous range queries on Storm. There the cost of the query is estimated based on the range and density of moving objects. The continuous range queries are grouped according to their costs by the way of round-robin. For the queries belonging to the same group, they are distributed according to a counter array by another round-robin. Double round-robins ensure that the load distributions to multiple downstream bolts are balanced. We implemented continuous range query topology with GSG into Storm. Compared with the most practicable built-in grouping strategy shuffle grouping, our proposed grouping is able to reduce load imbalance degree and load standard deviation by 2–3 times and reduce load fluctuation by 1–2 times. The throughput can be improved up to nearly 20%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Global Shuffle Grouping (GSG): A Load Balancing Strategy for Continuous Range Queries on Storm

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Load balancing scheme for supporting real-time processing of big data in distributed in-memory systems
Kyoungsoo Bok ... Kitae Choi
-
Kyoungsoo Bok, et. al.Kyoungsoo Bok ... Kitae Choi
09 Oct 2018
09 Oct 2018

Real-Time Big Data Processing and Analytics: Concepts, Technologies, and Domains
Uğur Kekevi̇ ... Ahmet Arif Aydin
Computer Science | VOL. -
Uğur Kekevi̇, et. al.Uğur Kekevi̇ ... Ahmet Arif Aydin
27 Nov 2022
Computer Science | VOL. -

Real-Time Processing of Big Data Streams: Lifecycle, Tools, Tasks, and Challenges
Fatih Gurcan ... Muhammet Berigel
-
Fatih Gurcan, et. al.Fatih Gurcan ... Muhammet Berigel
01 Oct 2018
01 Oct 2018

The Development of Real-time Large Data Processing Platform Based On Reactive Micro-Service Architecture
Haidong Lv ... Jiahui Xu
-
Haidong Lv, et. al.Haidong Lv ... Jiahui Xu
05 May 2020
05 May 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Global Shuffle Grouping (GSG): A Load Balancing Strategy for Continuous Range Queries on Storm

Abstract

Talk to us

Similar Papers