Recently, Egypt has recognized the pivotal role of High Performance Computing in advancing science and innovation. Additionally, Egypt realizes the importance of collaboration between different institutions and universities to consolidate their own computational and data resources into a unified platform to serve different disciplines (e.g., scientific, industrial, governmental). Otherwise, additional resources would be needed to be purchased with the associated cost, effort, and time difficulties (e.g., setup, administration, maintenance, etc.). Thus, this paper delves into the architecture and capabilities of the EN-HPCG grid using two different workload management systems: (i) Slurm (Open-Source) and (ii) PBS Pro (Licensed). This paper compares the performance of the grid between Slurm and PBS Pro in specific high-throughput computing (HTC) applications using the NAS Grid parallel benchmark (NGB) to determine which workload manager is more suitable for EN-HPCG. The evaluation includes grid-level performance metrics such as throughput, and the number of tasks completed as a function of time. Also, the presented methodology aims to assist potential partners in their decision-making process to join the EN-HPCG grid, with a focus on the site speed-up metric. Our results showed that, unless an open-source solution without cost and license problems is an obligation (in which case, Slurm is the viable solution), then it is not advisable to integrate a cluster with high-speed hardware with a cluster possessing outdated hardware when using the Slurm scheduler. In contrast, the PBS Pro scheduler takes into account online decision-making in a dynamic environment using a unified grid.
Read full abstract