Abstract

The era of big data has led to the emergence of new systems for real-time distributed stream processing. Apache Storm is one of the most popular stream processing systems today. However, Storm, as many other stream processing systems, lacks an intelligent scheduling mechanism. The default round-robin scheduling which disregards inter-node traffic and worker nodes load balancing may be inefficient sometimes. This paper proposed a real-time scheduling algorithm based on inter-node traffic and worker nodes load balancing within Storm. Algorithm is divided into two steps: In the first step according to the topology structure and inter-node traffic, executors are assigned to slots to ensure the minimum interaction traffic. The second step, we consider the worker nodes load, to choose the lowest load node for slots assignment. Experiments demonstrate that this scheduling algorithm compared to the default scheduling algorithm, performance of average latency and average inter-node traffic in the system improved above 50%, and compared the traffic-based scheduling algorithm, improved about 10%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call