In large-scale, distributed high-performance computing systems, the increasing complexity of job scheduling has expanded along with the growth of computational resources and job diversity. While heuristic scheduling strategies with various optimization objectives have shown promising results, their effectiveness is often limited in real-world applications due to the dynamic nature of workloads and system configurations. Deep reinforcement learning (DRL) methods offer the potential to address scheduling challenges. However, their trial-and-error learning approach can lead to suboptimal performance or resource wastage in the early stages. To mitigate these risks, this paper introduces an offline reinforcement learning-based job scheduling method. By training on historical data, the method avoids the pitfalls of deploying immature strategies in live environments. We constructed an offline dataset by combining expert scheduling trajectories with early-stage trial data from online reinforcement learning. This enables the development of more robust scheduling policies. Experimental results demonstrate that, compared to heuristic and online DRL algorithms, the proposed approach achieves more efficient scheduling performance across various workloads and optimization goals, showcasing its practicality and broad applicability.
Read full abstract