GENIE: QoS-guided Dynamic Scheduling for CNN-based Tasks on SME Clusters

Zhaoyun Chen,Haoduo Yang,Mei Wen,Jie Yu,Chunyuan Zhang,Lei Luo

doi:10.23919/date.2019.8715279

Abstract

Convolutional Neural Network (CNN) has achieved dramatic developments in emerging Machine Learning (ML) services. Compared to online ML services, offline ML services that are full of diverse CNN workloads are common in small and medium-sized enterprises (SMEs), research institutes and universities. Efficient scheduling and processing of multiple CNN-based tasks on SME clusters is both significant and challenging. Existing schedulers cannot predict the resource requirements of CNN-based tasks. In this paper, we propose GENIE, a QoS-guided dynamic scheduling framework for SME clusters that achieves users’ QoS guarantee and high system utilization. Based on a prediction model derived from lightweight profiling, a QoS-guided scheduling strategy is proposed to identify the best placements for CNN-based tasks. We implement GENIE as a plugin of Tensorflow and experiment with real SME clusters and large-scale simulations. The results of the experiments demonstrate that the QoS-guided strategy outperforms other baseline schedulers by up to 67.4% and 28.2% in terms of QoS-guarantee percentage and makespan.

Full Text