Abstract

How to efficiently process concurrent data tasks such as online analytical queries in datacenter environments is still a big challenge for current computing techniques. One of the fundamental reasons is that their task execution normally involves large numbers of distributed data operators, which are always expensive in terms of communication time. To improve the general performance, various advanced approaches on the execution optimization of data operators have been proposed in the past years. However, most of them focus on application-level optimization, such as using data locality scheduling to reduce network traffic. Moreover, few of them has considered the optimization opportunities for concurrent execution of multiple data operators. In this paper, we propose a novel coflow-based scheduling system called CoFlop, which aims to improve network communication time for multiple distributed operators at a query level, and on that basis to lay a solid foundation for the development of a network-aware query execution system in datacenter networks. We introduce the detailed system design of CoFlop and conduct a simulation-based evaluation with large concurrent distributed join operations. Compared to existing methods, the experimental results show that CoFlop can perform better in the presence of different large workloads.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call