Abstract

AbstractGPU platforms have been widely adopted in both academia and industry to support deep learning (DL) research and development (R&D). Compared with giant companies who favor custom‐designed AI platforms, most small‐and‐medium‐sized enterprises, institutes and universities (EIUs) prefer to build or rent a cost‐effective GPU cluster, usually in a limited‐scale, to process diverse DL R&D workloads. Therefore, more attention has been attracted by DL scheduling with the aim of improving the system efficiency and task performance. However, prior prediction‐based schedulers are limited in terms of their prediction accuracy and profiling overhead. Accordingly, in this article, we propose a reinforcement learning (RL)‐based online GPU scheduler, RIFLING, to model the scheduling problem as an online decision‐making process. Scheduling decisions are made according to Q‐learning, which is a typical RL method. RIFLING can achieve high scheduling efficiency based on the online exploring and exploiting of diverse scheduling strategies for various DL workloads, without the need for expensive offline profiling or sophisticated prediction model. We implement RIFLING as a plugin of Tensorflow, and deploy it on a distributed GPU cluster. Experiments demonstrate that RIFLING achieves up to 47.8% reductions and 19.6% improvements in makespan and average normalized processing rate respectively compared to the best available baseline without any manual intervention.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.