Abstract

Real time processing of stream data has become increasingly vital. Batched stream systems which discretize stream data into micro-batches and leverage batch system to process these micro-batch stream jobs have attracted wide attention from academia and industry. Such batched stream system always works on heterogeneous environments which have heterogeneous resources and heterogeneous tasks. Unfortunately, current batched stream system implementations designed and optimized for homogeneous environments perform poorly on heterogeneous environments. We attribute suboptimal performance in heterogeneous environments to schedule tasks according to data locality and free slots. On the one hand, data locality creates a barrier between large tasks of slow node and powerful capacity of fast node because slow nodes prefer local large tasks rather than remote small tasks. On another hand, due to scheduler's blind eye to task size, there is a very high probability that large tasks are scheduled in the last few waves. These two aspects hinder perfect load balancing, causing tail latencies of large tasks. To address these issues, we propose a blank scheduling framework called Radar. Being aware of node capacity and task size, Radar pre-steals large tasks from slow nodes and schedules tasks according to the principle of large task first. Then Radar fills the small free slots by choosing small tasks corresponding to node's capacity. We implement Radar in Spark-2.1.1. Experimental results with benchmark show that Radar can reduce job completion time by 27.78% to 42.79% over Spark Streaming. Experimental results with real Tencent production application show that Radar can reduce response time by 28.57%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.