Multi-task Deep Reinforcement Learning for Scalable Parallel Task Scheduling

Lingxin Zhang,Jingyu Wang,Haifeng Sun,Qi Qi,Jianxin Liao

doi:10.1109/bigdata47090.2019.9006027

Abstract

The rapid development of artificial intelligence in multiple scenarios, including machine learning, image recognition, and autonomous driving, has led to an explosion of computation jobs. These jobs are often divided into parallel child tasks and executed in distributed clusters with limited computing resources, making parallel task scheduling one of the most important research topics nowadays. Most studies about parallel task scheduling focused on formulating special scenarios and service requirements as optimization problems. However, the complicated and dynamic parallel computing environments are hard to model, predict and control, making those previous methods unscalable and unable to reflect the real scenarios. In this paper, a Multi-task Deep reinforcement learning approach for scalable parallel Task Scheduling (MDTS) is firstly devised. Generally, Deep Reinforcement Learning (DRL) is a model-free optimization algorithm for long-term control by leveraging experience, but it suffers the curse of dimensionality for decision when coping with complex parallel computing environments and jobs with diverse properties. We extend the action selection in DRL to a multi-task decision, where the output branches of multitask learning are fine-matched to parallel scheduling tasks. Child tasks of a job are accordingly assigned to distributed nodes without any human knowledge while the resource competition among parallel tasks is leveraged through shared neural network layers. Extensive experiments show that the MDTS significantly reduces the job execution time compared with least-connection scheduling and particle swarm optimization algorithm by 15.3% and 39.8% respectively. Moreover, MDTS outperforms the raw DRL algorithm on job execution time, load imbalance value, and total cost by 42.8%, 47.5%, and 59.0%.

Full Text