Abstract

Deep reinforcement learning applied to computing systems has shown potential for improving system performance, as well as faster discovery of better allocation strategies. In this paper, we map HPC batch job scheduling to the SMDP formalism, and present an online, deep reinforcement learning-based solution that uses a modification of the Proximal Policy Optimization algorithm for minimizing job slowdown with action masking, supporting large action spaces. In our experiments, we assess the effects of noise in run time estimates in our model, evaluating how it behaves in small (64 processors) and large (16384 processors) clusters. We also show our model is robust to changes in workload and in cluster sizes, showing transfer works with changes of cluster size of up to 10×, and changes from synthetic workload generators to supercomputing workload traces. In our experiments, the proposed model outperforms learning models from the literature and classic heuristics, making it a viable modeling approach for robust, transferable, learning scheduling models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call