Abstract

Modern High Performance Computing (HPC) clusters often comprise a huge amount of computing resources of different capabilities, making them heterogeneous and difficult to manage. In addition, they must deal with a wide range of applications with different requirements. All this poses a great challenge to the workload managers that assign applications to resources. There are many new proposals to overcome this challenge, including some that employ Deep Reinforcement Learning (DRL) techniques. This paper proposes a novel simulation framework for the study of workload managers, that has been conceived to foster the study of workload managers based on DRL techniques. Its main features include the simulation of heterogeneous clusters based on multicore architectures, taking into account the contention in shared memory access and the energy consumption. A validation of the accuracy and performance of the simulator was made, compared with a real environment based on Slurm. This shows good accuracy of the results, with a relative error below 5% in makespan and 10% in energy consumption, and speedups up to 200.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.