Abstract

Apache Storm is a distributed stream processing framework to support real-time processing of big data. Even if many stream grouping strategies have been implemented in Storm to partition stream data in order to maximize usability of resources, but they cannot efficiently support continuous range query. It is the basis of location based services, in which both queries and objects are moving. The reason is that the spatial semantics of the query (range and data distribution) cannot be expressed by those strategies, and this is easy to result in load imbalance. For this problem, we propose a load-balancing strategy called global shuffle grouping (GSG) to support efficient continuous range queries on Storm. There the cost of the query is estimated based on the range and density of moving objects. The continuous range queries are grouped according to their costs by the way of round-robin. For the queries belonging to the same group, they are distributed according to a counter array by another round-robin. Double round-robins ensure that the load distributions to multiple downstream bolts are balanced. We implemented continuous range query topology with GSG into Storm. Compared with the most practicable built-in grouping strategy shuffle grouping, our proposed grouping is able to reduce load imbalance degree and load standard deviation by 2–3 times and reduce load fluctuation by 1–2 times. The throughput can be improved up to nearly 20%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.