Abstract
Big data analytics frameworks are developing towards larger degrees of parallelism and shorter task durations to achieve lower latency. Consequently, millions of scheduling decisions need to be made per second, which has posed a big challenge to today's centralized schedulers. Therefore, many researchers and enterprises turn to distributed scheduling approaches to avoid the throughput limitation of centralized designs. To our knowledge, Omega, Apollo and Sparrow are three famous approaches that make prior moves in distributed scheduling but they each have shortcomings and none of them try peer-to-peer architecture. We then propose a new scheduling approach called Piper that adapts peer-to-peer idea to the domain of distributed scheduling, which provides near-optimal performance. We have implemented Piper using Apache Thrift and the results show that Piper reduces job response times by over 1.5× when compared to Sparrow (we select Sparrow for comparison because it is a leading design and has been open source). In addition, trace-driven simulations have been used to evaluate Piper when scaling to large clusters, which further reveals that Piper provides better performance than Sparrow.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.